SUMMA delivers a state-of-the-art media monitoring system
Edinburgh Informatics researchers, in collaboration with colleagues from University College London, University of Sheffield, BBC and Deutsche Welle, have developed a system that will help journalist monitor media sources – a task that has become too big to handle for newsrooms with the development of the internet and digital media.
SUMMA (Scalable Understanding of Multilingual Media) has developed a scalable, multilingual monitoring platform that incorporates media processing tools and natural language processing technologies.
Automated monitoring and translation
The platform’s fully automated monitoring system takes in the content through an application programming interface. The system then automatically transcribes all audio from video, turning speech into text and translates all text – from original text articles or from transcribed speech to text – into English. Project partners developed state-of-the-art speech recognition and machine translation systems for German, English, Spanish, Latvian, Portuguese, Arabic, Persian (Farsi), Russian and Ukrainian. The platform currently processes these languages, but it can cover virtually all major languages by integrating off-the-shelf tools.
Automated summarisation and sentiment analysis
A cross-lingual overview of the content is created. Next, related items are clustered into stories, stories and items are summarised, topical keywords as well sentiment analysis added. The project team designed, developed and deployed the platform, several prototypes of which were then tested by journalists at BBC and Deutsche Welle. The BBC already exploits SUMMA’s achievements by using a prototype transcription engine that makes material ingested by BBC Monitoring searchable in a user-friendly way for monitoring journalists.
They also employ a system that uses the platform to alert BBC World Service teams to published stories that would make ideal candidates for translation. In addition, Deutsche Welle is utilising SUMMA components in the European Broadcasting Union project Eurovox, which is developing standards for automated language processing such as translation, transcription, subtitling and voice-over for broadcasting.
Spin-outs add value
Two spin-out companies have been established as a result of SUMMA. Based on the platform, Mindflux has developed a one-stop solution for automation-assisted content localisation to translate media in production quality. It will enable users to transcribe, translate and subtitle any audio, video or text in one place. Hatch AI has developed artificial intelligence and machine learning solutions for the financial services industry, building on the platform’s components.
“With the SUMMA platform, it’s easier than ever to aggregate, structure and analyse language data. Media professionals and newsrooms around the world can simply filter content to match their needs.”