Exploring Natural Language Processing in Model-To-Model Transformations

Paulius Danenas; Tomas Skersys

doi:10.1109/ACCESS.2022.3219455

IEEE Access (Jan 2022)

Exploring Natural Language Processing in Model-To-Model Transformations

Paulius Danenas,
Tomas Skersys

Affiliations

Paulius Danenas: ORCiD; Center of Information Systems Design Technologies, Kaunas University of Technology, Kaunas, Lithuania
Tomas Skersys: Center of Information Systems Design Technologies, Kaunas University of Technology, Kaunas, Lithuania

DOI: https://doi.org/10.1109/ACCESS.2022.3219455
Journal volume & issue: Vol. 10
pp. 116942 – 116958

Abstract

Read online

In this paper, we explore the possibility to apply natural language processing in visual model-to-model (M2M) transformations. Therefore, we present our research results on information extraction from text labels in process models modeled using Business Process Modeling Notation (BPMN) and use case models depicted in Unified Modeling Language (UML) using the most recent developments in natural language processing (NLP). Here, we focus on three relevant tasks, namely, the extraction of verb/noun phrases that would be used to form relations, parsing of conjunctive/disjunctive statements, and the detection of abbreviations and acronyms. Techniques combining state-of-the-art NLP language models with formal regular expressions grammar-based structure detection were implemented to solve relation extraction task. To achieve these goals, we benchmark the most recent state-of-the-art NLP tools (CoreNLP, Stanford Stanza, Flair, Spacy, AllenNLP, BERT, ELECTRA), as well as custom BERT-BiLSTM-CRF and ELMo-BiLSTM-CRF implementations, trained with certain data augmentations to improve performance on the most ambiguous cases; these tools are further used to extract noun and verb phrases from short text labels generally used in UML and BPMN models. Furthermore, we describe our attempts to improve these extractors by solving the abbreviation/acronym detection problem using machine learning-based detection, as well as process conjunctive and disjunctive statements, due to their relevance to performing advanced text normalization. The obtained results show that the best phrase extraction and conjunctive phrase processing performance was obtained using Stanza based implementation, yet, our trained BERT-BiLSTM-CRF outperformed it for the verb phrase detection task. While this work was inspired by our ongoing research on partial model-to-model transformations, we believe it to be applicable in other areas requiring similar text processing capabilities as well.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords