RUDN Journal of Language Studies, Semiotics and Semantics (Mar 2022)
Structural Models of English Terms of Automated Processing of Scientific and Technical Texts Corpora
Abstract
The article is devoted to the structural models of English multi-component terms from the subject area Welding types as a basis for marking the corpora of scientific and technical texts. The place of corpora of scientific and technical texts in corpus linguistics and prospects of further scientific research based on them are marked. Relevance of the research is conditioned by the necessity to create the corpus of scientific and technical texts, in general, and means of automatic marking of terms, in particular. It has been substantiated that the main problem in creating the corpus of scientific and technical texts is automatic marking of terminological word combinations. The analysis of the current state of the terminology system of the subject area Welding types has been carried out. The formal structure of elements of the Welding types terminology system is considered. The results of the analysis of two, three, four-component English terminological word combinations of the Welding types subject area and their structural models are presented. All structural models of English terminology combinations are illustrated with examples. The most productive models of English terms word combinations are highlighted. It is shown that the most productive model - the combination of a nucleus element with a noun or an adjective in the function of the prepositional definition - can be traced in two-component word combinations, but the analysis of more complex formations shows that the model of left definition attached to the term kernel is also present in them, demonstrating generic features. The necessity of enumerating all possible structural models of terminological combinations in the subject area Welding types has been substantiated. The novelty of the study is seen in the formation of a database of structural models of terminological combinations as the basis of a superstructure database on the structure of terms to improve the quality of automatic marking of the bodies of scientific and technical texts and processing of terms-candidates in the conduct of body studies.
Keywords