Google is improving speech recognition for people who have conditions that affect their ability to pronounce words.
What’s new: Project Relate is an Android app that offers a personalized speech recognition model for people whose speech is impaired or otherwise atypical. It’s designed to understand, transcribe, and synthesize speech for both person-to-person and person-to-machine interactions.
How it works: Researchers trained Relate using over a million speech samples collected from people with conditions including amyotrophic lateral sclerosis, cerebral palsy, Down syndrome, Parkinson’s disease, stroke, or traumatic brain injury. Users can fine-tune the system to their own voices by speaking 500 phrases.
- Relate transcribes speech to text in real time, restates what the user says in a synthesized voice, and translates the user’s voice commands for Google Assistant, such as dimming lights or playing music.
- It builds on previous Google projects. Parrotron is an LSTM-based sequence-to-sequence model that translates spoken words into synthesized speech. Another, unnamed project allowed researchers to develop fine-tuned speech recognition models using a speech-to-text network known as a recurrent neural network transducer.
- Google is recruiting English-speaking volunteers in Australia, Canada, New Zealand, and the U.S. to provide feedback to the beta-test version.
Behind the news: Recognizing the need to make their products more inclusive, tech companies have launched initiatives to make apps more accessible.
- Israeli startup Voiceitt developed a smartphone app that translates impaired speech into a synthesized voice for Amazon Alexa. Like Relate, it can be fine-tuned on a user’s voice.
- Apple is training Siri to recognize words spoken by people who stutter using audio clips of stuttered speech.
- AccessiBe has had mixed results with its accessibility tech, which uses object recognition to generate video captions and transcripts. The company recently was caught up in a lawsuit by disability advocates who said that its technology often mislabels images, making it harder for visually impaired people to navigate the web.
Why it matters: People whose speech is atypical can be excluded from social interactions, have trouble communicating when they need help, and experience difficulty using voice-activated devices. Technology that lets them be heard could make their lives richer, safer, and more engaging.
We’re thinking: Speech recognition is a convenience for most people, but for those with unusual speech patterns, it could be a lifeline.