Loading...

Meta announces text-to-speech generative AI model ‘Voicebox’

Meta announces text-to-speech generative AI model ‘Voicebox’
Photo Credit: Pixabay
Loading...

Facebook's parent company Meta has announced a generative  artificial intelligence (AI) model Voicebox which converts text to speech and includes features to edit audio and work across languages.

In an Instagram post shared by Meta CEO Mark Zuckerberg on Friday, a video showed how Voicebox could read out text in a variety of vocal styles, remove noisy distractions from audio tracks, learn and replicate speakers’ voices, and even produce output in different languages.

Meta researchers said that the system has been trained on more than 50,000 hours of unfiltered audio. Specifically, Meta used recorded speech and transcripts from a bunch of public domain audiobooks written in English, French, Spanish, German, Polish, and Portuguese. That diverse data set, according to Meta researchers allows the system to “generate more conversational sounding speech, regardless of the languages spoken”.

Loading...

According to Zuckerberg, Voicebox can synthesize speech using a two-second audio sample. With that clip, it can match the audio style as well as do text-to-speech generation or re-create a portion of the speech that may have been interrupted by some external noise.

Meta claimed that the speech generator is so “powerful” that it can “outperform all existing models”, and that it is powerful enough to generate voices as easily as ChatGPT can generate text and Bing or Dall-E 2 can create images. That said, while other generative AI platforms like ChatGPT and Google's Bard generates certain text in response to a query using natural language processing and machine learning, Meta's new generative AI, Voicebox produces audio clips.

“Our results show that speech recognition models trained on Voicebox-generated synthetic speech perform almost as well as models trained on real speech,” said researchers, claiming that the computer generated speech is performed with just a 1% error rate degradation, compared to the 45% to 70% drop-off seen with existing TTS models.

Loading...

Voicebox can be also be used to give a natural-sounding voice to virtual assistants or characters in the metaverse, which are digital worlds in which people will gather to work, play and hang out. It can further be used by visually impaired people to hear messages read by the voices of their friends and loved ones, the company said.

Voicebox is still a work in progress and not available to the public yet. Meta said, it recognises the potential harm this AI could be used for and is working on an effective way to distinguish between authentic speech and audio generated by Voicebox.

The launch of ChatGPT in November 2022 sparked interest in generative AI among enterprises -- and prompted almost every IT majors to develop new offerings in this space. Not just Microsoft, Google and Nvidia, who are aggressively dominating the AI space, earlier in June, Adobe announced plans for a new generative AI subscription plan, and Cisco Systems also announced new AI tools for its WebEx video conferencing software.

Loading...

Sign up for Newsletter

Select your Newsletter frequency