Volunteers wanted to prepare Gaelic for the AI age

Scottish Gaelic-speaking volunteers are being sought to help with an ambitious scheme to ensure the language thrives in the digital age.

Graphic of lap top and a black and white image from an archive recording Gaelic history

University experts are looking for people from Scotland and beyond to transcribe thousands of hours of Gaelic-language recordings from some of the country’s most significant archives.

The project will expand the written and spoken examples of Scottish Gaelic available for automatic speech recognition, assisting computers to understand the language and turn it into text and other digital projects, researchers say. 

In doing so, it aims to enrich Scotland’s cultural heritage and bolster research for Gaelic learners and hearing-impaired users world-wide. 

Volunteers will be asked to transcribe archival recordings of traditional folklore into text. This will make online archive portals more useful for the public and researchers alike. 

Sound archives

The project, Opening the Well, will transcribe the sounds of Scotland’s traditions held in two sound archives, which are presented on the Tobar an Dualchais/Kist o Riches web portal. It is an online resource that contains material from these archives and from the BBC. 

Among them are 1,500 hours of Gaelic storytelling recordings from the School of Scottish Studies Archives, based at the University of Edinburgh, and The Canna Collection, in the care of the National Trust for Scotland.

Creating a community

Researchers say opening up this digital work will help create a community with volunteers sharing progress, tips and discussion.

Their transcription will allow full-text search, opening up thousands of pages of history, traditions and stories, for study.

Researchers will then assemble a large body of Gaelic language data and use it to generate a high-quality automatic speech recognition (ASR) system for media, education and research.

Speech recognition

Project leads say the data will advance Gaelic automatic speech recognition by supporting a system where computers can be given lots of spoken and written examples of Gaelic so they can recognise speech and accurately process the language. 

It will also create machine learning datasets of Gaelic words, sentences and audio clips to help AI systems learn the patterns of the language.

Leugh seo sa Ghàidhlig

By turning spoken heritage into fully annotated, searchable text, the project not only safeguards Scotland’s intangible cultural legacy but also generates the vital data needed to advance Gaelic speech recognition and other forms of language technology. With just a broadband connection and an interest in language and heritage, anyone can contribute – and in doing so, take part in a tangible act of cultural revitalisation. Opening the Well brings the Gaelic community together with academia and technology to unlock the voices of the past and power the future of the language.

We’re delighted to working in partnership with the University of Edinburgh and NTS on the Opening the Well project. This will add significantly to the number of transcriptions available on the Tobar an Dualchais/Kist o Riches website, which in turn will allow greater access to our shared Gaelic heritage. Drawing on the expertise of Gaelic speakers to transcribe recordings is also something we’ve been interested in developing for some time, we look forward to supporting those who are keen to get involved.

This is a really worthwhile and exciting project which will only serve to enhance access to the wonderful collection of sound recordings undertaken by John Lorne Campbell of Canna.

The School of Scottish Studies Archives are delighted that their internationally renowned collections of audio recordings and transcriptions have formed the foundations of ÈIST, a truly 21st century project. In a time when the fast pace of change in AI technologies makes many uneasy, our heritage professionals have ensured the responsible, ethical re-use of the archives. This aspect is particularly important as over the last nearly 75 years, these archives have been gathered from and relate directly to communities throughout the Gàidhealtachd. The crowdsourcing element of this phase, Fosgladh an Tobair/Opening the Well, is a tremendous opportunity for people to contribute to both their own history and future: the project simultaneously improving the historical record and contemporary language technology. It is marvellous that our collections will once again find a new audience and a new role in cultural life. Archives are never truly in the past, they continue to be a factor in our future.

The Opening the Well project adds to a series of projects led by linguists and data experts at the University of Edinburgh to connect Scottish Gaelic with advances in artificial intelligence, digital communication and language technology.

People can apply for the scheme via the Opening the Well website.

Opening the Well is backed by a grant from the Scottish Government (‘Ecosystem for Interactive Speech Technology’ or ‘ÈIST’) prior work funded by the URKI Arts and Humanities Research Council (AHRC). 

The Opening the Well website is launching at an event at the University of Glasgow on Tuesday, 2 December as part of this year’s Angus Matheson Memorial Lecture, delivered by Professor Will Lamb.

 

Tags

2025
Data, Digital and AI
Research