Information Research: An International Electronic Journal (Jan 2002)

Compounds in dictionary-based Cross-language information retrieval_revised

Journal volume & issue
Vol. 7, no. 2
p. 128

Abstract

Read online

Compound words form an important part of natural language. From the cross-lingual information retrieval (CLIR) point of view it is important that many natural languages are highly productive with compounds, and translation resources cannot include entries for all compounds. Also, compounds are often content bearing words in a sentence. In Swedish, German and Finnish roughly one tenth of the words in a text prepared for information retrieval purposes are compounds. Important research questions concerning compound handling in dictionary-based cross-language information retrieval are 1) compound splitting into components, 2) normalisation of components, 3) translation of components and 4) query structuring for compounds and their components in the target language. The impact of compound processing on the performance of the cross-language information retrieval process is evaluated in this study and the results indicate that the effect is clearly positive.