Retour vers liste

Détail de la contribution

Auteur: Dunstan BROWN

Co-Auteur(s): Roger EVANS, University of Brighton

Discovering classes in Tlatepuzco Chinantecan

Abstract/Résumé: Oto-Manguean languages are well-known for the complexity of their morphology, combining tone, stem and affixal patterns. We apply machine-learning to compare two different accounts of Tlatepuzco Chinantec conjugation. The two accounts compared are those of Palancar (forthcoming) and Merrifield & Anderson (2007). The dataset used is the sample of 775 verbs used by Palancar (forthcoming), based on Merrifield & Anderson’s dictionary. The method is based on that described in Brown and Evans (2012). First, Merrifield & Anderson (2007) is used as a reference data set. This consists of 4 characters, such as {A17a}, and groups verbs according to their tone for different persons in separate TAM combinations. We apply Brown and Evan’s (2012) method: i) creating a distance matrix using compression, based on the actual forms; ii) using the matrix to create an unordered tree which represents the similarity between the verbs; iii) confirming that there is a stable solution; iv) comparing the result with a reference tree created from the Merrifield & Anderson classification. This then yields a score for how well the two trees match. We interpret a greater degree of match as validation of the classification. Second, we consider Palancar’s proposal that the verbs can be grouped into basic conjugation classes, based on the notion of inflectional series. A similar method is applied as for Merrifield and Anderson’s classification. We use the stable tree created on the basis of the forms and compare this with Palancar’s classification. Again, this will yield a score for how well the two trees match. As there is a probabilistic element in step (ii) of the method it allows for items to be partial members of classes. Several outcomes are possible: a) one classification is better than another; b) both reference classifications are equally successful, when compared with that produced by the learning algorithm, but they account for different subsets of the data; c) both classifications are equally successful when compared with that produced by the learning algorithm, and they account for the same portion of the data; d) neither classification matches well with that produced by the learning algorithm. Each of these outcomes is potentially interesting. The last indicates that we have no independent external validation of either classification, while (a) suggests that there is a ‘right answer’. Outcome (b) suggests equivalence of the classifications, while outcome (c) suggests equally powerful conflicting principles of classification. Oto-Manguean languages are particularly attractive for trying out this method for investigating cross-cutting systems of classification.