Auteur: Lyne DA SYLVA

Co-Auteur(s): Laure GUITARD

Lexical semantics and automatic indexing: using semantic relations, predicates and arguments

Abstract/Résumé: This work is concerned with the application of linguistics to information science, specifically to document indexing. We wish to explore the import of lexical semantics to automatic document indexing by proposing an analysis of a class of vocabulary which plays a specific role in indexing. Indexing has a significant linguistic basis: descriptors are linguistic expressions, extracted or paraphrased from the document. Words are used differently in human indexing. Namely, frequency matters: high-frequency words in the collection are ineffective for indexing. Also, indexing favours expressions belonging to specialized terminology (e.g., “art nouveau”, “satellite dish”). Yet highly frequent words in a document also include non-specialized words, so-called “basic scholarly vocabulary” (BSV): “development”, “importance”, "form", etc. Normally excluded as stand-alone indexing terms, they prove interesting as qualifiers of specialized terms (e.g. “art nouveau – development”; “satellite dish – importance”). Automatic indexing can rest on a given a list of BSV paired up with collocated specialized terminology to such produce structured entries. The characterization of the BSV is the goal of our research. We show how the BSV can be structured using thesaural relations (synonymy, hypernymy, association) by making parallels with existing thesauri (including WordNet). We also sketch an analysis of the semantic arguments of these words according to FrameNet, thereby suggesting a different treatment of predicates and arguments in indexing languages.