Ostium (Dec 2013)

Les caractéristiques de la terminologie des sciences relatives à la famille du point de vue de l’extraction terminologique

  • Ágoston Nagy

Journal volume & issue
Vol. 9, no. 4

Abstract

Read online

Nagy, Á.: The Characteristics of Terminology of Sciences in relation to Family from point of view of Terminological Extraction According to Eugen Wüster, terms are lexical units that belong to a scientific domain where they are connected to a concept that they denote; therefore, terms have to have a precise definition. In the term extraction process, terms can mainly be recognised by morphosyntactic patterns: for example, noun+noun is a typical term pattern in French (e.g. navigateur web). One of the aims of this article is to find the typical term patterns and their frequency in the domain of social sciences. For this reason, three articles were chosen as corpus in the social sciences domain with the criterion that they include frequently the words famille ’family’ and/or individu ’individual’. In the three articles, all terms were manually annotated. The other aim of this article is to compare the frequencies of the term patterns in social sciences with the results of previous research on terms of a corpus of computer science. The further aim of this analysis is to determine whether an automatic term extractor fine-tuned for texts on computer science could also be used on a corpus of social sciences. In order to achieve this goal, problematic patterns – like adjectives preceding the nominal head in a term – are also examined. The results showed that the IT corpus followed the same tendency as the corpus on human sciences; however, juxtaposed nouns are less frequent in the latter which prefers the noun-adjective sequence. Concerning the problematic patterns, the two corpora did not show important differences: their presence is minimal in both (~7%). So the same rule-based extractor could work well on both corpora; however, psychological and sociological terms are more frequently used in common language, which makes statistical filtering more difficult.

Keywords