Challenges and solutions
In the world of machine learning, the more data you have, the better your system will be.
As well as Dr Lamb’s earlier work, the ASR development incorporates two recent spoken language ethnographic recording projects - Saoghal Thormoid and Stòras Beò nan Gàidheal - conducted by project partner, UHI.
Additionally, the team has started to work with Tobar an Dualchais to transcribe interviews with Gaelic speakers that include precious elements of oral history and traditional storytelling, and to collaborate with Gaelic broadcasting organisation MG Alba.
In this way, Will and the team are hoping to overcome the challenges of getting technology to understand the dialectal diversity of Scottish Gaelic, as he explains:
“English models are geared towards middle-class Home Counties in the UK, or standard east coast dialects in the US, but that approach doesn’t work with Gaelic. The most common dialect today, Lewis Gaelic, is markedly different from the dialects of neighbouring Harris and North Uist. So, despite the fact that we are catering to a small fraction of the speakers that you would have with English, our challenges are, in many ways, much more pronounced.”
Developing an ASR system for Scottish Gaelic is backed by the Data-Driven Innovation (DDI) initiative, which is led by the University of Edinburgh and Heriot-Watt University and is a key part of the Edinburgh and South East Scotland City Region Deal. The project has also received generous support from Soillse, the National Research Network for the Maintenance and Revitalisation of Gaelic Language and Culture. UPDATE: In November 2021, the researchers won the Innovation Award at the Scottish Gaelic Awards 2021.