Anomaly Detection Algorithms for Low-Dimensional and High-Dimensional Data: A Critical Study

Mujeeb Ur Rehman; Muhammad Waseem; Abdul Sattar; Muti Ullah

doi:10.51846/vol6iss4pp42-49

Pakistan Journal of Engineering & Technology (Mar 2024)

Anomaly Detection Algorithms for Low-Dimensional and High-Dimensional Data: A Critical Study

Mujeeb Ur Rehman,
Muhammad Waseem,
Abdul Sattar,
Muti Ullah

Affiliations

Mujeeb Ur Rehman: Department of Software Engineering, University of Management and Technology, Sialkot, Pakistan
Muhammad Waseem: Department of Artificial Intelligence, University of Management and Technology, Sialkot, Pakistan
Abdul Sattar: Department of Cyber Security, Khwaja Fareed University of Engineering and IT, Abudhabi Road, Rahim Yar Khan, Pakistan
Muti Ullah: Department of Computer Science, Khwaja Fareed University of Engineering and IT, Abudhabi Road, Rahim Yar Khan, Pakistan

DOI: https://doi.org/10.51846/vol6iss4pp42-49
Journal volume & issue: Vol. 6, no. 4

Abstract

Read online

Suspicious events or objects that differ from the norm in data can be discovered using anomaly identification. Identifying anomalies is critical for many applicable domains of life, e.g., preventing credit card theft and spotting intrusions into networks. It is possible to detect anomalies on a global scale as well as at the local level. A global outlier is a data point beyond the norm for the entire dataset, while a local outlier may be inside the norm for the entire dataset but outside the surrounding data points. Data outlier identification methods that are performed locally are inadequate. Therefore, better algorithms are required to investigate the high velocity of data and identify local outliers. Machine learning and data mining techniques need to be investigated to determine the pros and cons of anomaly identification residing inside data. The density based LOF method can be applied as the best choice to identify local outliers. While many variants of LOF exist for low-dimensional data, none are suitable for high-dimensional data. This research study discusses LOF, COF, and CBLOF methods for spotting local outliers in low and high-dimensional data. Regarding the size of the dimension, the performance of density-based algorithms is examined based on accuracy and time complexity. In this scenario, CBLOF achieves outstanding results due to its distinctive method of employing cluster-based local outlier detection.

Published in Pakistan Journal of Engineering & Technology

ISSN: 2664-2042 (Print); 2664-2050 (Online)
Publisher: The University of Lahore
Country of publisher: Pakistan
LCC subjects: Technology
Website: https://sites2.uol.edu.pk/journals/index.php/pakjet/index

About the journal

Abstract

Keywords