Technology Artificial Intelligence

Meta announces text-to-speech generative AI model ‘Voicebox’

Photo Credit: Pixabay

Sohini Bagchi

20 Jun, 2023

Facebook's parent company Meta has announced a generative artificial intelligence (AI) model Voicebox which converts text to speech and includes features to edit audio and work across languages.

In an Instagram post shared by Meta CEO Mark Zuckerberg on Friday, a video showed how Voicebox could read out text in a variety of vocal styles, remove noisy distractions from audio tracks, learn and replicate speakers’ voices, and even produce output in different languages.

Meta researchers said that the system has been trained on more than 50,000 hours of unfiltered audio. Specifically, Meta used recorded speech and transcripts from a bunch of public domain audiobooks written in English, French, Spanish, German, Polish, and Portuguese. That diverse data set, according to Meta researchers allows the system to “generate more conversational sounding speech, regardless of the languages spoken”.

According to Zuckerberg, Voicebox can synthesize speech using a two-second audio sample. With that clip, it can match the audio style as well as do text-to-speech generation or re-create a portion of the speech that may have been interrupted by some external noise.

Meta claimed that the speech generator is so “powerful” that it can “outperform all existing models”, and that it is powerful enough to generate voices as easily as ChatGPT can generate text and Bing or Dall-E 2 can create images. That said, while other generative AI platforms like ChatGPT and Google's Bard generates certain text in response to a query using natural language processing and machine learning, Meta's new generative AI, Voicebox produces audio clips.

“Our results show that speech recognition models trained on Voicebox-generated synthetic speech perform almost as well as models trained on real speech,” said researchers, claiming that the computer generated speech is performed with just a 1% error rate degradation, compared to the 45% to 70% drop-off seen with existing TTS models.

Voicebox can be also be used to give a natural-sounding voice to virtual assistants or characters in the metaverse, which are digital worlds in which people will gather to work, play and hang out. It can further be used by visually impaired people to hear messages read by the voices of their friends and loved ones, the company said.

Voicebox is still a work in progress and not available to the public yet. Meta said, it recognises the potential harm this AI could be used for and is working on an effective way to distinguish between authentic speech and audio generated by Voicebox.

The launch of ChatGPT in November 2022 sparked interest in generative AI among enterprises -- and prompted almost every IT majors to develop new offerings in this space. Not just Microsoft, Google and Nvidia, who are aggressively dominating the AI space, earlier in June, Adobe announced plans for a new generative AI subscription plan, and Cisco Systems also announced new AI tools for its WebEx video conferencing software.

Leave Your Comment(s)

Meta generative AI artificial intelligence Mark Zuckerberg ChatGPT

Meta announces text-to-speech generative AI model ‘Voicebox’

Leave Your Comment(s)

SUBSCRIBE TO NEWSLETTERS

Most Popular

Women’s Day: Mid, senior-level women techies need more role models, upskilling opportunities

AI governance should be an intrinsic part of tech skilling: Geeta Gurnani, IBM

Gender-balanced cyber workforce can lead to greater efficiency: Kris Lovejoy

SUBSCRIBE TO NEWSLETTERS

Sign up for Newsletter

Leave Your Comment(s)

Sign up for Newsletter

SUBSCRIBE TO NEWSLETTERS

Most Popular

Women’s Day: Mid, senior-level women techies need more role models, upskilling opportunities

AI governance should be an intrinsic part of tech skilling: Geeta Gurnani, IBM

Gender-balanced cyber workforce can lead to greater efficiency: Kris Lovejoy

SUBSCRIBE TO NEWSLETTERS

TRENDING STORIES

Sign up for Newsletter