IEEE Access (Jan 2022)

Exploring Natural Language Processing in Model-To-Model Transformations

  • Paulius Danenas,
  • Tomas Skersys

DOI
https://doi.org/10.1109/ACCESS.2022.3219455
Journal volume & issue
Vol. 10
pp. 116942 – 116958

Abstract

Read online

In this paper, we explore the possibility to apply natural language processing in visual model-to-model (M2M) transformations. Therefore, we present our research results on information extraction from text labels in process models modeled using Business Process Modeling Notation (BPMN) and use case models depicted in Unified Modeling Language (UML) using the most recent developments in natural language processing (NLP). Here, we focus on three relevant tasks, namely, the extraction of verb/noun phrases that would be used to form relations, parsing of conjunctive/disjunctive statements, and the detection of abbreviations and acronyms. Techniques combining state-of-the-art NLP language models with formal regular expressions grammar-based structure detection were implemented to solve relation extraction task. To achieve these goals, we benchmark the most recent state-of-the-art NLP tools (CoreNLP, Stanford Stanza, Flair, Spacy, AllenNLP, BERT, ELECTRA), as well as custom BERT-BiLSTM-CRF and ELMo-BiLSTM-CRF implementations, trained with certain data augmentations to improve performance on the most ambiguous cases; these tools are further used to extract noun and verb phrases from short text labels generally used in UML and BPMN models. Furthermore, we describe our attempts to improve these extractors by solving the abbreviation/acronym detection problem using machine learning-based detection, as well as process conjunctive and disjunctive statements, due to their relevance to performing advanced text normalization. The obtained results show that the best phrase extraction and conjunctive phrase processing performance was obtained using Stanza based implementation, yet, our trained BERT-BiLSTM-CRF outperformed it for the verb phrase detection task. While this work was inspired by our ongoing research on partial model-to-model transformations, we believe it to be applicable in other areas requiring similar text processing capabilities as well.

Keywords