The grammatical complexity of Tok Pisin: A quantitative assessment

Abstract/Résumé: Recent studies (McWhorter, 2001a, 2001b, 2005) have claimed that the grammars of creole languages are substantially less complex than grammars of other, more established languages. This claim has aroused a fair deal of disagreement in the Linguistics community (e.g., Kuster & Muysken; 2001; Plag, 2001; Baptista, 2003; Siegel:2004). In this study, I analyze a parallel corpus of Tok Pisin and English, the Wantok corpus (Slone, 2001a; 2001b). The corpus consists of 1,048 folk tales that appeared between the years 1972 and 1997 in the Stori Tumbuna section of Wantok, a Tok-Pisin newspaper from Papua-New Guinea. For each tale, the corpus contains the Tok Pisin and an English translation. In total, there are 872,706 word tokens in the Tok Pisin version, and 762,143 in the English one. This is the largest available machine-readable corpus of Tok Pisin. I studied multiple information-theoretical measures of grammatical complexity on these corpora. I found that the complexity values obtained for Tok Pisin corpora did not differ significantly from those observed on the English corpora, not supporting the purported grammatical 'over-simplicity' of creoles. However, from a diachronic pespective, a temporal trend of increasing complexity of the Tok Pisin morphological structure appears to be supported by the data. In summary, the results presented here are most consistent with the idea that languages balance their grammatical complexity across the different levels (morphological, syntactic, etc.) to an approximately constant overall level of complexity (e.g., Moscoso del Prado, 2012), with creoles being no clear exception to this pattern. However, the temporal increase in morphological complexity measures suggests that morphological structure may indeed require a relatively long time to arise (but it is nevertheless compesated for by other aspects of the grammar).