Language evolution seminar

Speaker: Andres Karjus (Centre for Language Evolution, University of Edinburgh)

Title: Challenges in detecting evolutionary forces in language change using diachronic corpora

Abstract: Newberry et al. (Detecting evolutionary forces in language change, Nature 551, 2017) tackle an important but difficult problem in linguistics, the testing of selective theories of language change against a null model of drift. They use the Frequency Increment Test (FIT), an application of the t-test to detect signatures of selection in experimental and diachronic data. Having applied the test to a number of relevant examples, they suggest stochasticity has a previously under-appreciated role in language evolution. They also infer the effective population size and show that the strength of drift correlates inversely with corpus frequencies, echoing the analogous observation about small populations in genetics. We replicate their results based on the application of the FIT and find that while the overall observation of the prevalence of drift holds, results produced by this approach on individual time series are highly sensitive to how the corpus is organized into temporal segments (binning). We further investigate the properties of the FIT by using a large and controlled set of time series simulations to systematically explore the range of possible applicability of the test and the artefacts introduced by the binning protocol. The approach proposed by Newberry et al. provides a systematic way of generating hypotheses about language change and broad generalizations in a sample of time series, marking another step forward in research on large scale linguistic data with a deep diachronic dimension. However, we argue that along with the possibilities, the limitations of the approach need to be appreciated. Caution should be exercised with interpreting the results of the FIT (or a similar test) on individual corpus-based linguistic time series, given the limitations of the test, demonstrable bias in certain scenarios, as well as fundamental differences between genetic and linguistic data.


Andres Karjus

Nov 06 2018 -

Seminar room 4 (B.02), Chrystal Macmillan Building, 15a George Square, Edinburgh, EH8 9LD