Àidh, Robot - researchers develop first ASR system for Scottish Gaelic
Will Lamb outlines what Automatic Speech Recognition means for the future of Scotland’s Gaelic language.
Automatic Speech Recognition (ASR) systems determine how we use our voices to interact with smart devices.
They are a cornerstone of modern language technology that, to date, have remained undeveloped for Scottish Gaelic.
Now, researchers from Celtic and Scottish Studies and Data Science and Digital Humanities at the University of Edinburgh have partnered with the University of the Highlands and Islands (UHI) and Quorate Technology Limited to put that right.
In this extract from the Edinburgh Impact article, 'Hi-tech tool prompts hope of virtual assistants fluent in Gaelic', Principal Investigator Dr Will Lamb talks about bringing the project to fruition.
The latest project in a body of language technology research
Developing an ASR system is part of an iterative research programme at the University of Edinburgh devoted to developing language technology tools for Scottish Gaelic.
Specifically, it builds upon two of Dr Lamb’s previous studies: the Gaelic Part-of-Speech Tagging Project (funded by the Carnegie Trust and Bòrd na Gàidhlig); and the Gaelic Handwriting Recognition Project (funded by the University's Challenge Investment Fund), which involved digitising hundreds of manuscripts in the School of Scottish Studies Archives.
Through this work, the researchers were able to amass millions of spoken and written Gaelic words. Artificial Intelligence experts - including LLC Chancellor's Fellow, Dr Beatrice Alex - then used the data to train a computer system to analyse and process Gaelic speech similarly to how humans do.
The project has various applications, including in Computer-Assisted Language Learning (CALL) and in media subtitling. Speaking of its importance, Will says “Ensuring that Gaelic has a place in the modern technological landscape is key for its survival. By enlisting the support and expertise of the Gaelic community, and giving back to them in this way, we hope to demonstrate that any minority language can thrive in the digital age.”
Challenges and solutions
In the world of machine learning, the more data you have, the better your system will be.
As well as Dr Lamb’s earlier work, the ASR development incorporates two recent spoken language ethnographic recording projects - Saoghal Thormoid and Stòras Beò nan Gàidheal - conducted by project partner, UHI.
Additionally, the team has started to work with Tobar an Dualchais to transcribe interviews with Gaelic speakers that include precious elements of oral history and traditional storytelling, and to collaborate with Gaelic broadcasting organisation MG Alba.
In this way, Will and the team are hoping to overcome the challenges of getting technology to understand the dialectal diversity of Scottish Gaelic, as he explains:
“English models are geared towards middle-class Home Counties in the UK, or standard east coast dialects in the US, but that approach doesn’t work with Gaelic. The most common dialect today, Lewis Gaelic, is markedly different from the dialects of neighbouring Harris and North Uist. So, despite the fact that we are catering to a small fraction of the speakers that you would have with English, our challenges are, in many ways, much more pronounced.”
Developing an ASR system for Scottish Gaelic is backed by the Data-Driven Innovation (DDI) initiative, which is led by the University of Edinburgh and Heriot-Watt University and is a key part of the Edinburgh and South East Scotland City Region Deal. The project has also received generous support from Soillse, the National Research Network for the Maintenance and Revitalisation of Gaelic Language and Culture. UPDATE: In November 2021, the researchers won the Innovation Award at the Scottish Gaelic Awards 2021.
Read the full article on Edinburgh Impact
Are you interested in studying with us?
Home of the School of Scottish Studies Archives, we are the longest established Celtic department in Scotland. Choose from a wide range of undergraduate degrees, including an MA Hons degree in Celtic and Linguistics. We also have a range of postgraduate programmes, including our Masters by Research in Scottish Ethnology.
Find out more about Celtic and Scottish Studies
Keep up to date with the project on the Gaelic Algorithmic Research Group blog
Watch demos and more on the Centre for Data, Culture and Society website
Check out the new-look Tobar an Dualchais/Kist o Riches website