An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method

Haval A. Ahmed; Peshawa J. Muhammad Ali; Abdulbasit K. Faeq; Saman M. Abdullah

doi:10.14500/aro.10970

ARO-The Scientific Journal of Koya University (Sep 2022)

An Investigation on Disparity Responds of Machine Learning Algorithms to Data Normalization Method

Haval A. Ahmed,
Peshawa J. Muhammad Ali,
Abdulbasit K. Faeq,
Saman M. Abdullah

Affiliations

Haval A. Ahmed: Department of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region, F.R. Iraq
Peshawa J. Muhammad Ali: Department of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region, F.R. Iraq
Abdulbasit K. Faeq: Department of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan Region, F.R. Iraq
Saman M. Abdullah: (1) Department of Software Engineering, Faculty of Engineering, Koya University, Koya KOY45, Kurdistan region – F.R. Iraq. (2) Department of Computer Engineering, Faculty of Engineering, Tishk International University, Erbil, Kurdistan Region - F.R. Iraq

DOI: https://doi.org/10.14500/aro.10970
Journal volume & issue: Vol. 10, no. 2

Abstract

Read online

Data normalization can be useful in eliminating the effect of inconsistent ranges in some machine learning (ML) techniques and in speeding up the optimization process in others. Many studies apply different methods of data normalization with an aim to reduce or eliminate the impact of data variance on the accuracy rate of ML-based models. However, the significance of this impact aligning with the mathematical concept of the ML algorithms still needs more investigation and tests. To identify that, this work proposes an investigation methodology involving three different ML algorithms, which are support vector machine (SVM), artificial neural network (ANN), and Euclidean-based K-nearest neighbor (E-KNN). Throughout this work, five different datasets have been utilized, and each has been taken from different application fields with different statistical properties. Although there are many data normalization methods available, this work focuses on the min-max method, because it actively eliminates the effect of inconsistent ranges of the datasets. Moreover, other factors that are challenging the process of min-max normalization, such as including or excluding outliers or the least significant feature, have also been considered in this work. The finding of this work shows that each ML technique responds differently to the min-max normalization. The performance of SVM models has been improved, while no significant improvement happened to the performance of ANN models. It is been concluded that the performance of E-KNN models may improve or degrade with the min-max normalization, and it depends on the statistical properties of the dataset.

Published in ARO-The Scientific Journal of Koya University

ISSN: 2410-9355 (Print); 2307-549X (Online)
Publisher: Koya University
Country of publisher: Iraq
LCC subjects: Technology; Science
Website: http://aro.koyauniversity.org

About the journal

Abstract

Keywords