Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) (Jul 2022)
Performance Analysis of Hybrid Machine Learning Methods on Imbalanced Data (Rainfall Classification)
Abstract
This study proposes several methods to analyze the performance of the hybrid machine learning method using Voting and Stacking on rainfall classification. The two hybrid methods will combine five classification methods, namely Logistic Regression, Support Vector Machine, Random Forest, Artificial Neural Network, and eXtreme Gradient Boosting. The data used is Bandung City rainfall data for the years 2005 until 2021. The hybrid method is classified as an ensemble, which means combining several individual classification models to improve the performance of the built model. Voting algorithm has weaknesses in imbalanced data, while stacking does not. The results show that by combining five machine learning methods on an imbalanced dataset, the Stacking algorithm obtains an accuracy value of 99.60%. Meanwhile, with the addition of the SMOTE technique, the accuracy increases to 99.71%. This is supported by the performance of the Stacking method which is superior because it takes the best classification value for each individual model and can overcome the imbalance. Model evaluation does not only focus on accuracy, but also precision, recall, and f1-score. The contribution of this research is to provide information about the best Hybrid method between Voting and Stacking in obtaining model performance results on rainfall classification.
Keywords