Soft Bigram distance for names matching

Mohammed Hadwan; Mohammed A. Al-Hagery; Maher Al-Sanabani; Salah Al-Hagree

doi:10.7717/peerj-cs.465

PeerJ Computer Science (Apr 2021)

Soft Bigram distance for names matching

Mohammed Hadwan,
Mohammed A. Al-Hagery,
Maher Al-Sanabani,
Salah Al-Hagree

Affiliations

Mohammed Hadwan: Department of Information Technology, College of Computer, Qassim University, Buraydah, Saudi Arabia
Mohammed A. Al-Hagery: Department of Computer Science, College of Computer, Qassim University, Buraydah, Saudi Arabia
Maher Al-Sanabani: Faculty of Computer Science and Information Systems, Thamar University, Thamar, Yemen
Salah Al-Hagree: Department of Computer Sciences & Information Technology, IBB University, IBB, Yemen

DOI: https://doi.org/10.7717/peerj-cs.465
Journal volume & issue: Vol. 7
p. e465

Abstract

Read online Read online

Background Bi-gram distance (BI-DIST) is a recent approach to measure the distance between two strings that have an important role in a wide range of applications in various areas. The importance of BI-DIST is due to its representational and computational efficiency, which has led to extensive research to further enhance its efficiency. However, developing an algorithm that can measure the distance of strings accurately and efficiently has posed a major challenge to many developers. Consequently, this research aims to design an algorithm that can match the names accurately. BI-DIST distance is considered the best orthographic measure for names identification; nevertheless, it lacks a distance scale between the name bigrams. Methods In this research, the Soft Bigram Distance (Soft-Bidist) measure is proposed. It is an extension of BI-DIST by softening the scale of comparison among the name Bigrams for improving the name matching. Different datasets are used to demonstrate the efficiency of the proposed method. Results The results show that Soft-Bidist outperforms the compared algorithms using different name matching datasets.

Published in PeerJ Computer Science

ISSN: 2376-5992 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://peerj.com/computer-science/

About the journal

Abstract

Keywords