Pakistan Journal of Engineering & Technology (Mar 2024)
Anomaly Detection Algorithms for Low-Dimensional and High-Dimensional Data: A Critical Study
Abstract
Suspicious events or objects that differ from the norm in data can be discovered using anomaly identification. Identifying anomalies is critical for many applicable domains of life, e.g., preventing credit card theft and spotting intrusions into networks. It is possible to detect anomalies on a global scale as well as at the local level. A global outlier is a data point beyond the norm for the entire dataset, while a local outlier may be inside the norm for the entire dataset but outside the surrounding data points. Data outlier identification methods that are performed locally are inadequate. Therefore, better algorithms are required to investigate the high velocity of data and identify local outliers. Machine learning and data mining techniques need to be investigated to determine the pros and cons of anomaly identification residing inside data. The density based LOF method can be applied as the best choice to identify local outliers. While many variants of LOF exist for low-dimensional data, none are suitable for high-dimensional data. This research study discusses LOF, COF, and CBLOF methods for spotting local outliers in low and high-dimensional data. Regarding the size of the dimension, the performance of density-based algorithms is examined based on accuracy and time complexity. In this scenario, CBLOF achieves outstanding results due to its distinctive method of employing cluster-based local outlier detection.
Keywords