Auteur: Gabriela BILBIIE

Une étude quantitative de la mise en facteur à droite dans le Penn Treebank

Abstract/Résumé: The use of corpora for the study of ellipsis is quite unusual (exc. Meyer 1995 and Greenbaum & Nelson 1999 on English, and K. Harbusch and G. Kempen on Dutch and German). This paper is based on a study of Right Node Raising (RNR) in English, extracted from the Penn Treebank (PTB). Results obtained are twofold: on the one hand, RNR seems to be a much more frequent and much less constraint phenomenon than what is generally assumed; on the other hand, many of the proposed claims in the literature, in particular the semantic contrast constraint, seem not to be borne out by the PTB data. PTB tagging distinguishes between two RNR-types: (i) Regular RNR (474 occurrences), with two symmetric and elliptical phrases, as in (3), and (ii) Parenthetical RNR (94 occurrences), with the second phrase being parenthetical and elliptical, as in (4). (3) Tonight a group of men *RNR*-1, tomorrow night he himself *RNR*-1, [would go out there somewhere and wait]SV-1. (brwn-12426) (4) I mean, we all figured – I guess anybody’d figure *RNR*-1 – [Angie]SN-1. (brwn-21366) Since most of the Parenthetical RNR occurrences seem to come close not to RNR phenomena, but rather to syntactic amalgams (involving weak verbs), we focus on the 474 occurrences tagged as Regular RNR. Our results concur with some recent studies (e.g. Chaves 2008) challenging the RNR-tradition: (i) RNR may occur in any syntactic context, and not only in coordinate constructions as usually assumed; (ii) RNR has very low frequency at the clausal level (the pattern which is mostly discussed in the literature), but high frequency at the sub-clausal level (VP, NP, PP); (iii) There seems to be no restriction on syntactic category or grammatical function of the factorized chunk. Our investigation confirms the higher incidence of ellipsis in writing than in speech, but invalidates the claim that RNR is relatively uncommon compared with Gapping and Left Peripheral Ellipsis. By choosing only the clausal coordination domain for RNR occurrences, the previous corpus studies eliminated a significant mass of data. Third, we took a subset of 200 occurrences in the clausal and verbal domain, in order to observe the semantic contrast constraint, which is generally considered to be a necessary condition for RNR and Gapping (Hartmann 2000). The semantic relation between strings across which RNR applies belongs to one of the following classes: (i) ‘veritable’ contrast, (ii) scalarity, and (iii) ‘scenarios’ class, defining a heterogeneous set of non-contrastive relations. Crucially, this last class obviously contains elements which are not contrastive, therefore challenging the contrast requirement assumed to operate with RNR.