Insights
Lost in translation
Using generative AI to reach a global market
By
Doug Cook
—
4
Apr
2024
Generative AI is revolutionizing the way the world consumes information and media. Traditional dubbing and subtitling methods, which are often time-consuming and error-prone, are quickly being replaced by AI-powered solutions.
Not only can these intelligent solutions accurately transcribe and translate spoken language in near real time, they can even preserve the nuances and expressions of a speaker. These technologies promise to transform news outlets, education systems, and entertainment, making content instantly accessible to a diverse global audience.
Reinventing babel fish
At thirteen23, we've been investigating a number of these technologies, including Microsoft's Azure AI Speech, OpenAI’s Whisper, Google’s AudioPaLM, Meta’s SeamlessM4T model, and OpenVoice, an open source model developed by researchers at MIT, Tsinghua University in Beijing, and members of the startup MyShell.
Collectively, these solutions leverage generative AI to achieve a number of impressive feats, including:
Voice dubbing and cloning
The process of replicating a person’s voice for use in audio or video content, often replacing the original voice with another language or expression.
Accurate tone cloning
The ability to replicate the unique tonal characteristics of a specific voice, ensuring the cloned voice retains the distinctive sound and texture of the original.
Flexible voice style control
A feature that allows the manipulation and adjustment of various aspects of a voice, such as pitch, tempo, and emotion, to suit different contexts or preferences.
Cross-lingual voice cloning
The ability to clone a voice in one language and then use that cloned voice to speak in different languages, while maintaining the characteristics of the original voice across languages.
On our client projects, these technologies can support everything from basic audio translation, to text-to-speech (TTS), to seamless video translation to and from hundreds of different languages.
Translation using generative AI
The following is a basic video filmed in English, translated into French, and then into several different languages.
Each translation was achieved using cross-lingual voice cloning based on a single, short reference video. Here are a few more examples.
Reaching new markets
While these technologies can still struggle to match the accents and mannerisms of a native speaker, they are more than effective at generating an accurate translation while also retaining the voice and tone of the original speaker.
Here’s a more practical example translating one of our favorite talks at SXSW Interactive, John Maeda’s Design in Tech Report.
It’s easy to see just how transformative these technologies will be. A video series or podcast can now be instantly translated for a global market.
But that’s just the tip of the iceberg. Imagine the impact of making health and education content instantly available worldwide, helping rural and underprivileged communities by providing access to world-class education and training typically reserved for wealthier nations.
Having worked with organizations like the Holdsworth Center, Pew Charitable Trusts, and the Bill and Melinda Gates Foundation, we’re particularly excited about this opportunity.
Improving accessibility and inclusivity
In addition to expanding markets, our products and media will also be able to adapt their interfaces and experiences to better meet the needs of their audiences.
For people who are hard of hearing, AI-powered translation can convert spoken language into accurate captions. Similarly, it can convert written content into native audio translations to help people with visual impairments.
These new technologies will ensure that content is accessible to a wider audience, regardless of location, language, physical or even cognitive ability.
Have an idea or want to learn more? Subscribe to our newsletter and follow us on LinkedIn!
Doug Cook
Doug is the founder of thirteen23. When he’s not providing strategic creative leadership on our client engagements, he can be found practicing the time-honored art of getting out of the way.