Everyone Has a Voice Project Relate Offers Synthesized Speech that Works in Real Time

Published

Dec 01, 2021

Reading time

2 min read

Google is improving speech recognition for people who have conditions that affect their ability to pronounce words.

What’s new: Project Relate is an Android app that offers a personalized speech recognition model for people whose speech is impaired or otherwise atypical. It’s designed to understand, transcribe, and synthesize speech for both person-to-person and person-to-machine interactions.

How it works: Researchers trained Relate using over a million speech samples collected from people with conditions including amyotrophic lateral sclerosis, cerebral palsy, Down syndrome, Parkinson’s disease, stroke, or traumatic brain injury. Users can fine-tune the system to their own voices by speaking 500 phrases.

Relate transcribes speech to text in real time, restates what the user says in a synthesized voice, and translates the user’s voice commands for Google Assistant, such as dimming lights or playing music.
It builds on previous Google projects. Parrotron is an LSTM-based sequence-to-sequence model that translates spoken words into synthesized speech. Another, unnamed project allowed researchers to develop fine-tuned speech recognition models using a speech-to-text network known as a recurrent neural network transducer.
Google is recruiting English-speaking volunteers in Australia, Canada, New Zealand, and the U.S. to provide feedback to the beta-test version.

Behind the news: Recognizing the need to make their products more inclusive, tech companies have launched initiatives to make apps more accessible.

Israeli startup Voiceitt developed a smartphone app that translates impaired speech into a synthesized voice for Amazon Alexa. Like Relate, it can be fine-tuned on a user’s voice.
Apple is training Siri to recognize words spoken by people who stutter using audio clips of stuttered speech.
AccessiBe has had mixed results with its accessibility tech, which uses object recognition to generate video captions and transcripts. The company recently was caught up in a lawsuit by disability advocates who said that its technology often mislabels images, making it harder for visually impaired people to navigate the web.

Why it matters: People whose speech is atypical can be excluded from social interactions, have trouble communicating when they need help, and experience difficulty using voice-activated devices. Technology that lets them be heard could make their lives richer, safer, and more engaging.

We’re thinking: Speech recognition is a convenience for most people, but for those with unusual speech patterns, it could be a lifeline.

Subscribe to The Batch