A Heuristic Approach for Finding Similarity Indexes of Multivariate Data Sets

Rahim Khan; Muhammad Zakarya; Ayaz Ali Khan; Izaz Ur Rahman; Mohd Amiruddin Abd Rahman; Muhammad Khalis Abdul Karim; Mohd Shafie Mustafa

doi:10.1109/ACCESS.2020.2968222

IEEE Access (Jan 2020)

A Heuristic Approach for Finding Similarity Indexes of Multivariate Data Sets

Rahim Khan,
Muhammad Zakarya,
Ayaz Ali Khan,
Izaz Ur Rahman,
Mohd Amiruddin Abd Rahman,
Muhammad Khalis Abdul Karim,
Mohd Shafie Mustafa

Affiliations

Rahim Khan: ORCiD; Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
Muhammad Zakarya: ORCiD; Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
Ayaz Ali Khan: ORCiD; Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
Izaz Ur Rahman: ORCiD; Department of Computer Science, Abdul Wali Khan University, Mardan, Pakistan
Mohd Amiruddin Abd Rahman: ORCiD; Faculty of Science, Universiti Putra Malaysia, Serdang, Malaysia
Muhammad Khalis Abdul Karim: ORCiD; Faculty of Science, Universiti Putra Malaysia, Serdang, Malaysia
Mohd Shafie Mustafa: ORCiD; Faculty of Science, Universiti Putra Malaysia, Serdang, Malaysia

DOI: https://doi.org/10.1109/ACCESS.2020.2968222
Journal volume & issue: Vol. 8
pp. 21759 – 21769

Abstract

Read online

Multivariate data sets (MDSs), with enormous size and certain ratio of noise/outliers, are generated routinely in various application domains. A major issue, tightly coupled with these MDSs, is how to compute their similarity indexes with available resources in presence of noise/outliers - which is addressed with the development of both classical and non-metric based approaches. However, classical techniques are sensitive to outliers and most of the non-classical approaches are either problem/application specific or overlay complex. Therefore, the development of an efficient and reliable algorithm for MDSs, with minimum time and space complexity, is highly encouraged by the research community. In this paper, a non-metric based similarity measure algorithm, for MDSs, is presented that solves the aforementioned issues, particularly, noise and computational time, successfully. This technique finds the similarity indexes of noisy MDSs, of both equal and variable sizes, through utilizing minimum possible resources i.e., space and time. Experiments were conducted with both benchmark and real time MDSs for evaluating the proposed algorithm`s performance against its rival algorithms, which are traditional dynamic programming based and sequential similarity measure algorithms. Experimental results show that the proposed scheme performs exceptionally well, in terms of time and space, than its counterpart algorithms and effectively tolerates a considerable portion of noisy data.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords