Speaking Your Language Startup Papercup Offers AI-Powered Voice Translation

Published

Jun 29, 2022

Reading time

2 min read

A startup that automatically translates video voice overs into different languages is ready for its big break.

What’s new: London-based Papercup offers a voice translation service that combines algorithmic translation and voice synthesis with human-in-the-loop quality control. A recent funding round suggests that investors have a measure of confidence in the company’s approach.

How it works: Video producers can upload clips and specify an output language such as English, Mandarin, Italian, Latin American Spanish, or Brazilian Portuguese. They can choose among synthesized voices that represent a range of gender and age, and tweak the voice’s pitch and character and alter its emotional expression as “happy,” “sad,” “angry,” and the like.

Algorithms convert speech into text and translate it into the target language.
A text-to-speech generator renders the voice over in the new language. It was trained on a combination of third-party and proprietary data.
A native speaker of the output language checks the result and edits it manually if necessary.

Yes, but: Keeping in a human in the loop to oversee an operation as sensitive as language translation makes good sense. However, current technology can take this automation a good deal further. For instance, Papercup offers a selection of voices rather than generating a facsimile of the original voice in a new language. It doesn’t conform video of the speaker’s mouth to new languages — the mouth continues to form words in one language while the synthesized voice intones another. Nor does it demix and remix vocal tracks that are accompanied by background music or other sounds.

Why it matters: Automated voice over translation is yet another task in which machines are vying to edge out human workers. On one hand, automation can make translation available to producers on a tight budget, dramatically extending their reach to new markets and use cases. On the other hand, we worry that performing artists will lose work to such systems and support efforts to protect their livelihoods.

We’re thinking: Earlier this week, Nando de Freitas — DeepMind research director, Oxford professor, and former officemate of Andrew Ng’s — urged us on Twitter to translate the newly updated Machine Learning Specialization into every language. We're working with Coursera’s global translator community to create subtitles, but we're always eager to have options.

Subscribe to The Batch