Edinburgh Imaging


19 Jul 17. Seeing speech

Seeing Speech is an articulatory web resource for the study of Phonetics, developed by collaborative research from six Scottish Universities.

Collaborative Research

The University of Glasgow, Queen Margaret University, the University of Strathclyde, the University of Edinburgh and the University of Aberdeen have been working together since 2011 to develop an online resource that provides examples of expert talkers for teachers and students of Practical Phonetics, with additional later collaborations with Napier University and University College London. The functioning of the human vocal tract has been illustrated using a comprehensive set of target speech sounds which are presented online using as videos using ultrasound tongue imaging (UTI), magnetic resonance imaging (MRI) and 2D midsagittal head animations based on the MRI and UTI data.


Using MRI data

A large body of vocal tract data was gathered from MRI scans of volunteers, using the Siemens 3T large-bore Verio MRI system in the Edinburgh Imaging QMRI Facility. The website uses the dynamic MRI images  (7.5 per second) from one of these, an expert in phonetics.  MRI images allow us to view many aspects of the vocal tract in the varied configurations it adopts to create the sounds of speech. Particularly important structures are the moveable ones: the tongue, the mandible, the larynx, the epiglottis, the lips, abd the soft palates. From raw MRI data it is possible to create a range of images, including cross-sectional coronal, axial or sagittal slices of the body. The midsagittal plane is most familiar to students of phonetics, so it formed the focus for the website.

Recording audio & resynchronising audio and video

For the underlying research on speech production based on the MRI data, volunteers were shown International Phonetic Association symbols in a powerpoint presentation, delivered via fibre-optic video goggles. To record speech inside the MRI scanner, we used an OptoAcoustics FOMRI tm III dual-channel, fibre-optic microphone system. As well as containing no metal parts and being safe to use in an MRI machine, this microphone and associated software have a built-in noise-cancelling system to reduce noise generated by the MRI machine from the acoustic signal. Even so, for clarity on the website, the expert speaker was re-recorded in synchrony with the original audio, and the clean audio used for this purpose.

Animated video

The animated head is particularly unusual, since it is based on real articulatory movements of the whole system.  Typically, such animations are overly-simplified or even inaccurate, often ignoring larynx and mandible movement, for example. The animations were created in Autodesk Maya, using a 2-D head rig, allowing for control of jaw, tongue, lips, larynx, soft palate and uvula.

For a full description please click here.

Using the The International Phonetic Alphabet Table here, you can then view individual animated videos for each symbol, for both consonants and vowels. Simply click on each symbol.

For example,  film showing bilabial plosive /a_a/ and /i_i/ environments.

The outcome

The resource is already providing teachers and students of Practical Phonetics with a unique insight into speech production, and means to extrapolate from the more informative and articulatorily comprehensive MRI videos to ultrasound images (which they can access real-time in their university laboratories) thanks to the UTI videos. Compared to MRI, ultrasound provides much more limited images (showing mainly just the tongue - its midsagittal shape, location and movement), but ultrasound is often available to students, is more generally applicable in phonetic research due to its high frame-rate (120 images per second), can be used by non-specialists for laboratory recordings or fieldwork, and provides clean audio data. The complementary strengths of both techniques is enabling a surge of interest in speech articulation.

It is hoped that further developments by software developers will benefit children affected by a range of clinical speech disorders, and their families. Already Seeing Speech is being used by Speech and Language Therapists, by students, by accent coaches in the acting profession, and by language learners. The Seeing Speech site has had had over one and half million hits to date.