SpaceTransformers: Language Modeling for Space Systems

Audrey Berquand; Paul Darm; Annalisa Riccardi

doi:10.1109/ACCESS.2021.3115659

IEEE Access (Jan 2021)

SpaceTransformers: Language Modeling for Space Systems

Audrey Berquand,
Paul Darm,
Annalisa Riccardi

Affiliations

Audrey Berquand: ORCiD; Department of Mechanical and Aerospace, Intelligent Computational Engineering Laboratory, University of Strathclyde, Glasgow G1 1XQ, U.K
Paul Darm: Department of Mechanical and Aerospace, Intelligent Computational Engineering Laboratory, University of Strathclyde, Glasgow G1 1XQ, U.K
Annalisa Riccardi: ORCiD; Department of Mechanical and Aerospace, Intelligent Computational Engineering Laboratory, University of Strathclyde, Glasgow G1 1XQ, U.K

DOI: https://doi.org/10.1109/ACCESS.2021.3115659
Journal volume & issue: Vol. 9
pp. 133111 – 133122

Abstract

Read online

The transformers architecture and transfer learning have radically modified the Natural Language Processing (NLP) landscape, enabling new applications in fields where open source labelled datasets are scarce. Space systems engineering is a field with limited access to large labelled corpora and a need for enhanced knowledge reuse of accumulated design data. Transformers models such as the Bidirectional Encoder Representations from Transformers (BERT) and the Robustly Optimised BERT Pretraining Approach (RoBERTa) are however trained on general corpora. To answer the need for domain-specific contextualised word embedding in the space field, we propose SpaceTransformers, a novel family of three models, SpaceBERT, SpaceRoBERTa and SpaceSciBERT, respectively further pre-trained from BERT, RoBERTa and SciBERT on our domain-specific corpus. We collect and label a new dataset of space systems concepts based on space standards. We fine-tune and compare our domain-specific models to their general counterparts on a domain-specific Concept Recognition (CR) task. Our study rightly demonstrates that the models further pre-trained on a space corpus outperform their respective baseline models in the Concept Recognition task, with SpaceRoBERTa achieving significant higher ranking overall.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords