Retour vers liste

Détail de la contribution

Auteur: Ulrich HEID

Co-Auteur(s): Annette KLOSA, IDS, Mannheim, GERMANY Jonas KUHN, IMS, Stuttgart, GERMANY

Titre:
German compound participles: from corpus-based data acquisition to a dictionary for interactive use


Abstract/Résumé: We propose a design study on the conception of a lexical resource enriched with morphological data; it should be usable for both, Natural Language Processing and interactive query by humans. The targeted morphological phenomenon is represented by German compound participles (Partizipkomposita), such as 'rebenumrankt' (`vine-clad'), 'problembeladen' (`loaded with problems'), 'marmorbelegt' (`laid out with marble'). As one version of the resource is for human use (and in focus here), we first address user needs wrt. The treatment of word formation in electronic dictionaries (cf. Radtke/Heid 2012); then we discuss descriptive problems encountered with German compound participles, as well as possibilities to exploit their close relationship with syntactic constructions in (semi-)automatic data acquisition from corpora. We finally comment on the design of the lexical data collection. We start from the Function Theory of Lexicography (cf. Tarp 2008). In text understanding, users wish to get data about the reading(s) and the pragmatic properties of a complex word. For text production, they need to ascertain its existence, and, with pragmatic data (frequency, register, style, ...), its contextual appropriateness. This includes data about competing word formation patterns and syntactic constructions usable instead of the complex word. The resource thus contains a careful combination of lexicographically processed data with data acquired from corpora. Corpus search should identify the syntactic construction underlying (and likely interchangeable with) a given compound participle, as well as (1) semantic groups of the nominal (or adjectival) elements (efeu- / rosen- / clematis- umrankt, vs. geheimnis- / legenden- umrankt), (2) syntactic and semantic readings of the verbs underlying the participles, and (3) correspondences between participle and syntactic construction. From parsed data, the predicate-argument structures of the verbs can be deduced, and from these, the syntactico-semantic structure of the compound participles can be inferred. We will also address the issue that the verbal constructions seem to allow more different lexical material than the compounds References: Radtke, Janina; Heid, Ulrich (2012): ``Word formation in electronic language resources: state of the art analysis and requirements for the future'', in: Proceedings of Euralex - 2012, Oslo, pp. 794-802. Tarp, Sven (2008): Lexicography in the Borderland between Knowledge and Non-knowledge: General Lexicographical Theory with particular Focus on Learner's Lexicography}. Tübingen: Max Niemeyer