Automated ticket booking lines could be improved to better understand what customers are saying, thanks to research.
Computer scientists have pinpointed the most common speech recognition errors made by automated phone systems, in a bid to improve their accuracy.
The study, led by a scientist at the University, found that computers commonly fail to understand speech when it is peppered with ‘umm’ and ‘err’ sounds.
Also, men tend to be misunderstood more than women when talking to computers, partly because they umm and err more frequently, the research found.
Scientists also found that speech recognition systems commonly fail to identify the first word spoken in a phrase.
Researchers say this may be because the machine cannot put the word in context, or because the speaker inhales just before talking.
Computers also make mistakes with words that sound similar and can occur in similar contexts, such as ‘I saw him’ or ‘I saw them’.
This is exacerbated when the word is not enunciated properly. Variations in pitch, tone and speed can also cause the system to misunderstand voices.
Researchers carried out the study by recording phone discussions between pairs of people and feeding their conversation into a speech recognition system to see how much it could understand.
The study, a collaboration between the University of Edinburgh and Stanford University, was published in the journal Speech Communication.
This work was supported by the Edinburgh-Stanford Link and the US Office of Naval Research.
Voices vary from one person to the next and it is challenging to design a computer system that can understand lots of different voices. We hope that by closely studying how people speak and how machines process this, we can help create better systems that will be simple and efficient for people to use.
Dr Sharon Goldwater
School of Informatics
This article was published on Mar 15, 2010