Computer Science Journal of Moldova (Oct 2012)

Toward the Soundness of Sense Structure Definitions in Thesaurus-Dictionaries. Parsing Problems and Solutions

  • Neculai Curteanu,
  • Alex Moruz

Journal volume & issue
Vol. 20, no. 3(60)
pp. 275 – 303

Abstract

Read online

In this paper we point out some difficult problems of thesaurus-dictionary entry parsing, relying on the parsing technology of SCD (Segmentation-Cohesion-Dependency) configurations, successfully applied on six largest thesauri -- Romanian (2), French, German (2), and Russian. \textbf{Challenging Problems:} \textbf{(a)}~Intricate and~/~or recursive structures of the lexicographic segments met in the entries of certain thesauri; \textbf{(b)}~Cyclicity (recursive) calls of some sense marker classes on marker sequences; \textbf{(c)}~Establishing the hypergraph-driven dependencies between all the atomic and non-atomic sense definitions. Classical approach to solve these parsing problems is hard mainly because of depth-first search of sense definitions and markers, the substantial complexity of entries, and the sense tree dynamic construction embodied within these parsers. \textbf{SCD-based Parsing Solutions:} \textbf{(a)}~The SCD parsing method is a procedural tool, completely formal grammar-free, handling the recursive structure of the lexicographic segments by procedural non-recursive calls performed on the SCD parsing configurations of the entry structure. \textbf{(b)}~For dealing with cyclicity (recursive) calls between secondary sense markers and the sense enumeration markers, we proposed the Enumeration Closing Condition, sometimes coupled with New{\_}Paragraphs typographic markers transformed into numeral sense enumeration. \textbf{(c)}~These problems, their lexicographic modeling and parsing solutions are addressed to both dictionary parser programmers to experience the SCD-based parsing method, as well as to lexicographers and thesauri designers for tailoring balanced lexical-semantics granularities and sounder sense tree definitions of the dictionary entries.

Keywords