Rasprave Instituta za Hrvatski Jezik i Jezikoslovlje (Jan 2023)

Lithuanian-English Cybersecurity Termbase: Principles of Data Collection and Structuring

  • Sigita Rackevičienė,
  • Andrius Utka,
  • Agnė Bielinskienė,
  • Liudmila Mockienė

DOI
https://doi.org/10.31724/rihjj.49.2.12
Journal volume & issue
Vol. 49, no. 2
pp. 439 – 461

Abstract

Read online

The aim of the paper is to present compilation and structuring principles, scope and development possibilities of the bilingual Lithuanian-English cybersecurity termbase. The paper discusses different approaches to terminology management, the best practices of which have been used to collect cybersecurity terminology and compile the termbase. Data collection has been mainly based on semasiological and corpus-driven approaches involving creation of deep learning systems trained to extract terminology from the cybersecurity corpora. To achieve systematicity and comprehensiveness of the dataset, the onomasiological and corpus-based approaches have also been incorporated in the data collection process. The termbase design decisions (its macrostructure and microstructure) have been based on onomasiological principles, while term variation has been handled by applying the descriptive approach. The termbase has been developed in the open-source cloud-based terminological management platform Terminologue. To ensure interoperability, the termbase has been exported into the TBX format and deposited into the CLARIN-LT repository. The paper also discusses possibilities of publishing terminological data as linguistic linked open data and linking it with other terminological resources and cybersecurity ontologies. The termbase is expected to be useful for cybersecurity specialists, translators, terminographers, lexicographers and the general public, as well as to contribute to the development of the Lithuanian cybersecurity terminology.

Keywords