IJCoL (Dec 2020)

Linguistically-driven Selection of Difficult-to-Parse Dependency Structures

  • Chiara Alzetta,
  • Felice Dell’Orletta,
  • Simonetta Montemagni,
  • Giulia Venturi

DOI
https://doi.org/10.4000/ijcol.719
Journal volume & issue
Vol. 6, no. 2
pp. 37 – 60

Abstract

Read online

The paper illustrates a novel methodology meeting a twofold goal, namely quantifying the reliability of automatically generated dependency relations without using gold data on the one hand, and identifying which are the linguistic constructions negatively affecting the parser performance on the other hand. These represent objectives typically investigated in different lines of research, with different methods and techniques. Our methodology, at the crossroads of these perspectives, allows not only to quantify the parsing reliability of individual dependency types, but also to identify and weight the contextual properties making relation instances more or less difficult to parse. The proposed methodology was tested in two different and complementary experiments, aimed at assessing the degree of parsing difficulty across (a) different dependency relation types, and (b) different instances of the same relation. The results show that the proposed methodology is able to identify difficult-to-parse dependency relations without relying on gold data and by taking into account a variety of intertwined linguistic factors. These findings pave the way to novel applications of the methodology, both in the direction of defining new evaluation metrics based purely on automatically parsed data and towards the automatic creation of challenge sets.