IEEE Access (Jan 2018)
Acronyms as an Integral Part of Multi-Word Term Recognition – A Token of Appreciation
Abstract
Term conflation is the process of linking together different variants of the same term. In automatic term recognition approaches, all term variants should be aggregated into a single normalized term representative, which is associated with a single domain-specific concept as a latent variable. In a previous study, we described FlexiTerm, an unsupervised method for recognition of multiword terms from a domain-specific corpus. It uses a range of methods to normalize three types of term variation - orthographic, morphological, and syntactic variations. Acronyms, which represent a highly productive type of term variation, were not supported. In this paper, we describe how the functionality of FlexiTerm has been extended to recognize acronyms and incorporate them into the term conflation process. The main contribution of this paper is not acronym recognition per se, but rather its integration with other types of term variation into the term conflation process. We evaluated the effects of term conflation in the context of information retrieval as one of its most prominent applications. On average, relative recall increased by 32 points, whereas index compression factor increased by 7% points. Therefore, evidence suggests that integration of acronyms provides nontrivial improvement of term conflation.
Keywords