UPDATED: Many Americans use assistive technology to allow them to speak with computer-generated voices. These voices have improved in quality in the last few decades, but are still rather robotic sounding and impersonal.
Teenager Haley Shiber speaks in a polished, mature voice. It comes out of a hot pink tablet computer attached to her wheelchair.
The Smyrna, Del., 16-year-old has cerebral palsy and a comprehensive, degenerative neuromuscular disorder. She uses switches on her wheelchair headrest to tell the speech program on her computer to produce words, jokes, and programmed phrases she can articulate quickly.
“My hobbies are riding my bike, going to see the Phillies, the opera, plays, and 4-H and art,” she said, in one such phrase.
Synthetic voices such as the one Haley uses have become easier on the ear in the past few decades, as the companies creating them strove for easy understandability. But they retain a distinct synthetic quality, one that isolates Haley, according to her mother, Debbie Shiber.
“The roboticness gets in the way of actually developing relationships,” Shiber said. “It’s accepted by us because, you know, that’s just her voice. But it would be wonderful if she had a more natural quality to her voice.”
The problem with synthetic voices is not just that they are, by nature, synthetic sounding. There are a limited number of voices to choose from, which makes it difficult for users of assistive communication devices to find a voice that matches their age and personality. Haley’s voice, for example, sounds a bit too grownup for a 16-year-old wearing neon pink shoelaces and teal-tinged glasses.
Experts say it gets worse: It is not uncommon for two or three people to be talking with assistive technology, all in the same room together, all in the exact same voice.
Giving Haley her own voice
This is where Tim Bunnell steps in. Bunnell is head of the Speech Research Lab at Nemours/Alfred I. duPont Hospital for Children. His goal is to give people their own voice for the first time.
“What we’re trying to do is develop personal voices for people,” Bunnell said. “So that everyone would have their own unique voice and be able to impose on it an identity that they can identify with.”
Haley Shiber is the first test case for Bunnell’s team.
Years ago, before she lost the ability to make any utterances during a critical surgery, Debbie Shiber recorded the sounds her daughter could make.
Recently, Bunnell dusted off those recordings and isolated a pure vowel sound from Haley’s vocalizations. Using software his team developed, he imposed the essence of that pure vowel sound onto a homemade synthetic voice he created using voice samples from a donor child. The resulting voice contained Haley’s voice quality and sounded younger, albeit choppier, than her old voice.
Bunnell recently loaded the new voice onto Haley’s computer when the family visited his Wilmington office.
“As a mother you never forget what your child’s voice sounds like, ” said Shiber, who was moved to tears the first time she heard the new voice. “Hearing the voice quality … it was just very emotional, because we haven’t heard Haley’s voice since 2006.”
A new generation of synthetic voices
Bunnell’s team originally developed the software for voice banking, to allow people with Lou Gehrig’s Disease and similar disorders to quickly create synthetic versions of their own voices for later use.
The technology is currently in beta testing under the name “ModelTalker Speech Synthesis System.” ALS patients and others can record voice samples at home, then send them to the company to be morphed into personalized synthetic voices.
The quality of the resulting voices vary greatly, however, largely because of the amount of data that goes into them. The homemade voices, including the voices for the ALS patients and the one Bunnell created for Haley, are based on about 45 minutes of recorded voice samples. Commercial voices use hours, sometimes dozens of hours, of speech, creating a much smoother voice that is easier to understand.
Before Bunnell gives new voices to any other test patients, he is developing a brand-new approach to create smoother, more professional-sounding personalized voices.
“Rather than record snippets of speech, we will actually have computers modeling how that speech is generated in the vocal tract,” Bunnell said. “So that it is a model, if you will, rather than a copy of the speech.”
Bunnell and a collaborator at Northeastern University will use measurements of vocal tract length, oral cavity width and other data to create this next generation of voices.
For now, the Shibers are glad Haley has a voice to call her own.
“Thank you for my speaking,” she said in her new voice when leaving Bunnell’s office. “You are awesome.”