Voice clones — the audio counterpart to deepfaked images — are poised to invade popular media and entertainment.
What’s new: Professionals and amateurs alike are using AI to emulate the voices of human actors, Wired reported.
Cloned like a pro: Game developers and marketers are cloning voices to save money and make their products more immersive.
- Sonantic, a UK-based startup, claims it can reproduce an actor’s voice from less than 20 minutes of training data. Its technology enables media creators to impart a variety of emotional inflections — such as angry, happy, or fearful — at varying levels of intensity. Sonantic shares revenue generated by voice cloning with the human originals.
- U.S.-based Replica Studios trains its system by having actors read 20 short sentences that cover the gamut of English phonetics. The company’s modification of the game Cyberpunk 2077 enables non-player characters to address the player by name. Like Sonantic, Replica shares voice-cloning revenue with human speakers.
- MSCHF, a marketing firm, synthesized the voice of rapper Gucci Mane and put his doppelgänger to work narrating Pride and Prejudice, Don Quixote, and other literary classics.
Remixers join in: Much of the entertainment industry is sorting out who owns which rights to an actor’s voice, but some amateur content creators have embraced the technology with abandon.
- Tim McSmythers, a researcher who goes by the handle Speaking of AI on social media, trained models to mimic the voices of celebrities like Adam Driver, Ellen Degeneres, and Jerry Seinfeld and composite their likenesses into famous movie and TV scenes. Our favorite: Homer Simpson telling Anakin Skywalker the legend of Darth Plagueis the Wise in a clip from Star Wars: The Phantom Menace.
- 15.ai, previously profiled in The Batch, allows users to generate custom dialogue using character voices from My Little Pony, Rick and Morty, and other games and TV shows. (The site is currently on hiatus.)
Why it matters: Voice cloning opens new avenues of creativity and productivity. For instance, generated voices can help developers road-test dialogue before bringing in the human talent and expand the conversational role of background characters. Yet the technology also holds potential for abuse, and guarding against them will require new kinds of vigilance.
We’re thinking: Have you ever been yelled at? We would love to build a system to transcribe the yeller’s words, and then re-synthesize their voice in a more polite tone.