Jurnal Ilmiah Kursor: Menuju Solusi Teknologi Informasi (May 2024)

THE INFLUENCE OF DATA CATEGORIZATION AND ATTRIBUTE INSTANCES REDUCTION USING THE GINI INDEX ON THE ACCURACY OF THE CLASSIFICATION ALGORITHM MODEL

  • Willy Fernando,
  • Deny Jollyta,
  • Dadang Priyanto,
  • Dwi Oktarina

DOI
https://doi.org/10.21107/kursor.v12i3.372
Journal volume & issue
Vol. 12, no. 3

Abstract

Read online

Numerical data problems are typically caused by a failure to comprehend the data and the outcomes of its processing. In order to give richer context and a deeper understanding of the facts, numerical data must be transformed into categories. On the other hand, changes in data have a significant impact on the analysis's outcomes. The purpose of this study is to see how transforming numerical data into categories affects the model produced by the classification algorithms. The dataset used in this study is the Maternal Health Risk. Categorization refers to formal arrangements. Categorization is also accomplished by using the Gini Index to limit the number of instances of an attribute. The classification is carried out using the Random Forest (RF), K-Nearest Neighbor (K-NN) and Support Vector Machine (SVM) algorithms to produce a model. The influence of data modifications to model can be observed in the confusion matrix with 5 different data splitting. The study results suggested that changing numerical data to categories data significantly improved the performance of the SVM model from 76.92% to 80.77% at a data splitting percentage of 95/5.

Keywords