University experts are looking for people from Scotland and beyond to transcribe thousands of hours of Gaelic-language recordings from some of the country’s most significant archives.
The project will expand the written and spoken examples of Scottish Gaelic available for automatic speech recognition, assisting computers to understand the language and turn it into text and other digital projects, researchers say.
In doing so, it aims to enrich Scotland’s cultural heritage and bolster research for Gaelic learners and hearing-impaired users world-wide.
Volunteers will be asked to transcribe archival recordings of traditional folklore into text. This will make online archive portals more useful for the public and researchers alike.
Sound archives
The project, Opening the Well, will transcribe the sounds of Scotland’s traditions held in two sound archives, which are presented on the Tobar an Dualchais/Kist o Riches web portal. It is an online resource that contains material from these archives and from the BBC.
Among them are 1,500 hours of Gaelic storytelling recordings from the School of Scottish Studies Archives, based at the University of Edinburgh, and The Canna Collection, in the care of the National Trust for Scotland.
Creating a community
Researchers say opening up this digital work will help create a community with volunteers sharing progress, tips and discussion.
Their transcription will allow full-text search, opening up thousands of pages of history, traditions and stories, for study.
Researchers will then assemble a large body of Gaelic language data and use it to generate a high-quality automatic speech recognition (ASR) system for media, education and research.
Speech recognition
Project leads say the data will advance Gaelic automatic speech recognition by supporting a system where computers can be given lots of spoken and written examples of Gaelic so they can recognise speech and accurately process the language.
It will also create machine learning datasets of Gaelic words, sentences and audio clips to help AI systems learn the patterns of the language.