Jurnal Infotel (Nov 2022)
KNN imputation to missing values of regression-based rain duration prediction on BMKG data
Abstract
The prediction of rain duration based on data from the Meteorology, Climatology, and Geophysics Agency (BMKG) is an important issue but remains an open problem. At the same time, several studies have shown that missing values can cause a decrease in the performance of the model in making predictions. This study proposes k-nearest neighbors (KNN) imputation to overcome the problem of missing values in predicting rain duration. The source of the rain duration prediction dataset is the BMKG data. We compared gradient boosting regression (GBR), adaptive boosting regression (ABR), and linear regression (LR) for the regression model for predicting rain duration. We compared the KNN imputation method with several benchmark methods, including zero imputation, mean imputation, and iterative imputation. Parameters r2, mean squared error (MSE) and mean bias error (MBE) measure the performance of these imputation methods. The test results show that for rain duration prediction using the regression method, GBR shows the best performance, both for train data and test data with r2 = 0.915 and 0.776, respectively. Then our proposed KNN imputation has the best performance for missing value imputation compared to the benchmark imputation method. The prediction values of r2 and MSE when using KNN imputation at Missing Percentage = 90% are 0.71 and 0.36, respectively.
Keywords