Hybridizing Fuzzy String Matching and Machine Learning for Improved Ontology Alignment

Mohammed Suleiman Mohammed Rudwan; Jean Vincent Fonou-Dombeu

doi:10.3390/fi15070229

Future Internet (Jun 2023)

Hybridizing Fuzzy String Matching and Machine Learning for Improved Ontology Alignment

Mohammed Suleiman Mohammed Rudwan,
Jean Vincent Fonou-Dombeu

Affiliations

Mohammed Suleiman Mohammed Rudwan: School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg 3201, South Africa
Jean Vincent Fonou-Dombeu: School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg 3201, South Africa

DOI: https://doi.org/10.3390/fi15070229
Journal volume & issue: Vol. 15, no. 7
p. 229

Abstract

Read online

Ontology alignment has become an important process for identifying similarities and differences between ontologies, to facilitate their integration and reuse. To this end, fuzzy string-matching algorithms have been developed for strings similarity detection and have been used in ontology alignment. However, a significant limitation of existing fuzzy string-matching algorithms is their reliance on lexical/syntactic contents of ontology only, which do not capture semantic features of ontologies. To address this limitation, this paper proposed a novel method that hybridizes fuzzy string-matching algorithms and the Deep Bidirectional Transformer (BERT) deep learning model with three machine learning regression classifiers, namely, K-Nearest Neighbor Regression (kNN), Decision Tree Regression (DTR), and Support Vector Regression (SVR), to perform the alignment of ontologies. The use of the kNN, SVR, and DTR classifiers in the proposed method resulted in the building of three similarity models (SM), encoded SM-kNN, SM-SVR, and SM-DTR, respectively. The experiments were conducted on a dataset obtained from the anatomy track in the Ontology Alignment and Evaluation Initiative 2022 (OAEI 2022). The performances of the SM-kNN, SM-SVR, and SM-DTR models were evaluated using various metrics including precision, recall, F1-score, and accuracy at thresholds 0.70, 0.80, and 0.90, as well as error rates and running times. The experimental results revealed that the SM-SVR model achieved the best recall of 1.0, while the SM-DTR model exhibited the best precision, accuracy, and F1-score of 0.98, 0.97, and 0.98, respectively. Furthermore, the results showed that the SM-kNN, SM-SVR, and SM-DTR models outperformed state-of-the-art alignment systems that participated in the OAEI 2022 challenge, indicating the superior capability of the proposed method.

Published in Future Internet

ISSN: 1999-5903 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: http://www.mdpi.com/journal/futureinternet/

About the journal

Abstract

Keywords