Auteur: Katharina EHRET

An information-theoretic approach to assess morpheme and construction complexity in English

Abstract/Résumé: I explore the impact of morpheme and construction complexity on the linguistic complexity of English drawing on state-of-the-art quantitative, information-theoretic methodologies. More specifically, I assess the degree to which individual inflectional morphemes such as –ing or –ed, and functional constructions such as progressive (be + verb-ing) or perfect (have + verb past participle) contribute to the complexity of English. The long-standing assumption that all languages are of equal complexity had remained unchallenged for much of the 20th century. Recently, however, this dogma has been questioned and scrutinised. Two central issues of the current complexity debate are, firstly, the problem of finding a generally applicable definition of complexity and secondly, how to measure this complexity. Methodically, I present an objective and economical method to assess linguistic complexity by using an unsupervised, algorithmic, information-theoretic measure (Juola 1998, 2008). This measure boils down to the notion of Kolmogorov complexity, which can be conveniently approximated by using modern file compression programmes. Kolmogorov complexity measures the information content of a (text) string by the length of the algorithm which is required to (re)generate the exact string (cf. Juola 2008). In a pilot study, I draw up a set of N = 10 features comprising (i) inflectional morphemes: –ing, –ed, genitive ‘s, plural –s/–es and third person singular –s; (ii) and a handful of functional constructions: progressive aspect be + verb-ing, perfect aspect have + verb past participle, passive voice be + verb past participle and the future markers will and going to. These morphemes and constructions will be analysed in three different text types of English: literary writing, newspaper and religious texts. The weight and impact of the individual features on the complexity will be assessed by manipulation of the respective morpheme/construction. I find that frequently recurring regular features such as, for example, the morphemes –ing and –ed decrease complexity. On the other hand, features marking “exceptions” such as third person singular –s, which attaches only to third person singular verbs and thus only co-occurs with certain pronouns or proper nouns (i.e. in a restricted context), increase complexity. I demonstrate that Kolmogorov complexity measurements yield linguistically meaningful results and can be used to assess the quantitative effect of individual morphemes and functional constructions on linguistic complexity. I conclude by sketching directions for further research and discussing advantages and drawbacks of the method.