Sistemasi: Jurnal Sistem Informasi (Jan 2022)

Performance Analysis of Random Forest Using Attribute Normalization

  • Arie Nugroho,
  • Abdullah Husin

DOI
https://doi.org/10.32520/stmsi.v11i1.1681
Journal volume & issue
Vol. 11, no. 1
pp. 186 – 196

Abstract

Read online

Data mining can process previous data into a pattern to help the next human activity. Data mining is divided into several methods: classification, clustering, association, and forecasting. This study, using the classification method to determine the pattern of a dataset so that it can be used to predict decisions with new data. The dataset for the classification method must have a label or class. Datasets that have an unbalanced number of tags (imbalanced datasets) can affect the shape of the model and predictive results for new data. To overcome this problem, this research uses the ensemble method and pre-processing. One of the algorithms in the ensemble learning method is a random forest, and the pre-processing used is attribute normalization by converting nominal data to numeric. Random forest is the development of the decision tree that produces a tree-shaped pattern, showing the flow of the classification process. Random forest will be used for the learning process on the data after the attribute normalization process is carried out. This study aims to apply the attribute normalization process and use the random forest algorithm to overcome imbalanced datasets and measure accuracy. This study uses a public dataset from the UCI Repository, namely car evaluation. The accuracy of this method is ± 99% with 90% training data and 10% testing data, and ± 95.95% with eight k-folds cross-validation, and the number of trees is 100 trees.