Detail of contribution
Auteur: Violeta SERETAN
Titre:
Two computation linguistics applications and a bridge between them
Abstract/Résumé: Automatically deriving the syntactic structure of a sentence ("syntactic parsing") and automatically identifying typical word co-occurrences in a corpus ("collocation extraction") are two main computational linguistics tasks. They both contribute to the major goal of language understanding; however, they are generally approached independently of each other. I argue that these tasks are interlinked to an extent that it becomes crucial to rely on each other to address challenges like lexical and structural disambiguation, which affect the performance in both cases. I propose an integrated approach, in which the two tasks are coupled to allow for the exchange of information, in order to enhance the performance of both applications. First, I will present a collocation extraction methodology in which syntactic information plays a key role in the stage of candidate identification, prior to the application of statistical association measures. Second, I will refer to a parsing approach in which information from a collocation lexicon is used to constrain the parser and to guide it through the maze of alternative analyses. The benefits of the integrated approach are apparent for each application: extraction results are significantly better when parsing information is available; at the same time, collocations contribute to an increase in parsing precision and coverage. The results show the feasibility and the usefulness of an approach combining syntactic parsing and collocation extraction. Together with similar results reported in literature for other scenarios, they suggest that computational linguistics work -- often fragmented -- becomes more efficient when performed in a synergetic way.
Titre:
Two computation linguistics applications and a bridge between them
Abstract/Résumé: Automatically deriving the syntactic structure of a sentence ("syntactic parsing") and automatically identifying typical word co-occurrences in a corpus ("collocation extraction") are two main computational linguistics tasks. They both contribute to the major goal of language understanding; however, they are generally approached independently of each other. I argue that these tasks are interlinked to an extent that it becomes crucial to rely on each other to address challenges like lexical and structural disambiguation, which affect the performance in both cases. I propose an integrated approach, in which the two tasks are coupled to allow for the exchange of information, in order to enhance the performance of both applications. First, I will present a collocation extraction methodology in which syntactic information plays a key role in the stage of candidate identification, prior to the application of statistical association measures. Second, I will refer to a parsing approach in which information from a collocation lexicon is used to constrain the parser and to guide it through the maze of alternative analyses. The benefits of the integrated approach are apparent for each application: extraction results are significantly better when parsing information is available; at the same time, collocations contribute to an increase in parsing precision and coverage. The results show the feasibility and the usefulness of an approach combining syntactic parsing and collocation extraction. Together with similar results reported in literature for other scenarios, they suggest that computational linguistics work -- often fragmented -- becomes more efficient when performed in a synergetic way.