# 71st Language at Edinburgh lunch

The Language@Edinburgh Lunch is a bi-monthly opportunity to present your work to an interdisciplinary audience in an intimate and feedback-rich setting, all while enjoying a buffet lunch.

Posters by both postgraduate students and academic staff are welcome on any area of human language research - including all sub-fields of linguistics, philosophy of language, natural language processing, psycholinguistics, and any other language related discipline. Reporting on work in progress is equally welcome.

## Posters

Acoustic classification of glottal stops in Upper Necaxa Totonac - Rebekka Puderbaugh

Glottal stops are known to be quite variable in their production, often appearing as non-modal phonation rather than an interval of complete closure. Glottal stops in Upper Necaxa Totonac (UNT, spoken by about 3400 people in the Northern Sierra of Puebla State in Mexico), function both as phonemic segments adjacent to vowels and as prosodic markers of phrase boundaries. Acoustically, glottal stops are often easily confusable with laryngealized vowels, which are also contrastive in UNT. More than 70% of glottal stops transcribed in the Upper Necaxa Totonac Dictionary (Beck, 2011) were preceded by laryngealized vowels (Puderbaugh, 2019), suggesting that glottal stops may sometimes be allophonic realizations of laryngealized vowels, or vice versa. Glottal stops may also appear in fricative + stop clusters that have previously been reported as ejective fricatives. The present study investigates the acoustic realizations of /ʔ/ in relation to the phonetic contexts in which they occur and relates the findings to issues in methodology for projects in documentary linguistics. The data used in this study were provided by four speakers of UNT. Speakers were presented with word list items one at a time and asked to produce each word within a frame sentence. The data reveal approximately six acoustic types of /ʔ/ tokens ranging from intervals of complete closure to intervals of reduced amplitude between neighboring vowels. Each acoustic type is described according to their frequency of occurrence, the phonetic contexts in which they occur, and any observed inter- or intra-speaker variability. In fricative + /ʔ/ clusters, glottal stops appeared most frequently as distinct closure intervals, while glottal stops between vowels were often realized as intervals of non-modal phonation. The data also reveal instances of apparent [ʔ] that do not appear in the dictionary forms and therefore raise questions about how the phonemic categories have been established and defined. The results of this study highlight the usefulness of phonetic annotation and acoustic data for documentary linguistics, especially with respect to the establishment of methodologies that will lead to reliably reproducible phonemic analyses.

San Diu – Is it a variety of Cantonese or is it something else? - Matthew Sung

San Diu, a language spoken in Northern Vietnam which is mostly found in Tuyen Quang, Thai Ngyuen, Vinh Phuc, Bac Giang and Quang Ninh provinces is understudied. The genetic relationship between San Diu and other languages is still not clear. There have been claims that San Siu is a form of Chinese language (Pham & Nguyen 2014: 89). Edmondson and Gregerson (2007: 744) stated that it is a form of archaic Cantonese, possibly related to Pinghua which is spoken in modern day Guangxi, China. Haudricourt (1960) compared 5 languages in the region of Moncay with Cantonese and Hakka and he classified San Diu under Hakka. In Ngyuen’s (2013) study, she compared San Diu vocabularies with three Chinese dialects: Yue (Guangzhou), Hakka (Meixian) and Southern Min (Teochew). She found that around 2/3 of the San Diu vocabularies are similar to Hakka (lexically and for some, phonetically). To explore the genetic classification of San Diu further, I will be using shared innovations as a criterion for classification in this paper. This is another way to falsify previous claims and the observation made by surface synchronic comparison between Chinese dialects and San Diu. Innovations that are prototypical and unique to three Chinese dialect groups were chosen and compared with San Diu. Over 400 syllables were analysed overall. The result shows that, firstly, a huge amount of words are not from a Sinitic origin. Secondly, San Diu shares innovations with Yue and Hakka. I argue that the Sinitic words in San Diu largely came from Yue, since more innovations are shared with Yue than Hakka. This, however, does not dispute the possibility that Hakka words did not make their way to San Diu. Further studies are needed for a deeper understanding to the origin of this language.

The System of Long Monophthongs in Central Mount Lebanon Lebanese - Georges Sakr

This project aims to determine the vowel inventory of Central Mount Lebanon Lebanese (henceforth CMLL), on the basis of a comprehensive acoustic analysis of data from 19 speakers. By then running a phonological analysis of my results, I make a case for the existence of long front and back close-mid monophthongs, and a front-back contrast in low vowels (which is typologically rare), within both the surface and underlying phonological inventories. I discuss the diachronic origins of the [e] and [o] vowel qualities in CMLL, and argue that the underlying phonological inventory of CMLL has seen these various origins merge into two synchronic categories. Studies on the vocalic inventories of Arabic dialects typically postulate three short monophthongs and three or five vowel systems in long monophthongs (e.g. Watson, 2002 and Al-Ani, 1970). The six-quality underlying long vowel system I am thus proposing goes against traditional expectations and descriptions of the vocalic inventories of Arabic dialects.

Sequence Labeling Parsing by Learning Across Representations - Michalina Strzyz

We use parsing as sequence labeling as a common framework to learn across constituency and dependency syntactic abstractions. To do so, we cast the problem as multitask learning (MTL). First, we show that adding a parsing paradigm as an auxiliary loss consistently improves the performance on the other paradigm. Secondly, we explore an MTL sequence labeling model that parses both representations, at almost no cost in terms of performance and speed. The results across the board show that on average MTL models with auxiliary losses for constituency parsing outperform singletask ones by 1.14 F1 points, and for dependency parsing by 0.62 UAS points.

A Grounded Unsupervised Universal Part-of-Speech Tagger for Low-Resource Languages - Ronald Cardenas Acosta, Ying Lin, Heng Ji, Jonathan May

Unsupervised part of speech (POS) tagging is often framed as a clustering problem, but practical taggers need to ground their clusters as well. Grounding generally requires reference labeled data, a luxury a low-resource language might not have. In this work, we describe an approach for low-resource unsupervised POS tagging that yields fully grounded output and requires no labeled training data. We find the classic method of Brown et al. (1992) clusters well in our use case and employ a decipherment-based approach to grounding. This approach presumes a sequence of cluster IDs is a ciphertext' and seeks a POS tag-to-cluster ID mapping that will reveal the POS sequence. We show intrinsically that, despite the difficulty of the task, we obtain reasonable performance across a variety of languages. We also show extrinsically that incorporating our POS tagger into a name tagger leads to state-of-the-art tagging performance in Sinhalese and Kinyarwanda, two languages with nearly no labeled POS data available. We further demonstrate our tagger's utility by incorporating it into a true zero-resource' variant of the MaLOPa (Ammar et al., 2016) dependency parser model that removes the current reliance on multilingual resources and gold POS tags for new languages. Experiments show that including our tagger makes up much of the accuracy lost when gold POS tags are unavailable.

## Contact

LANGUAGE Lunch

### Language at Edinburgh Lunch committee

Matthew King, Carine Abraham, Esperanza Ramos Badaya, Jie Chi, Nina Markl, Pauliina Vuorinen, Pilar Oplustil Gallegos

## Further information

The Language at Edinburgh Lunch is made possible through funding from the School of Philosophy, Psychology and Language Sciences and the Human Communication Research Centre, with the intent to facilitate interdisciplinary language research at the University of Edinburgh.

Feb 13 2020 -

### 71st Language at Edinburgh lunch

2020-02-13: Lunch meeting

Room G.07, The Informatics Forum, 10 Crichton Street, Edinburgh, EH8 9AB