Boosting Algorithm to Handle Unbalanced Classification of PM<sub>2.5</sub> Concentration Levels by Observing Meteorological Parameters in Jakarta-Indonesia Using AdaBoost, XGBoost, CatBoost, and LightGBM
Remote Sensing and Geographic Information Sciences Research Group, Faculty of Earth Sciences and Technology, Bandung Institute of Technology, Bandung, Indonesia
Maengseok Noh
College of Information Technology and Convergence, Pukyong National University, Busan, South Korea
Air quality conditions are now more severe in the Jakarta area that is among the world’s top eight worst cities according to the 2022 Air Quality Index (AQI) report. In particular, the data from the Meteorological, Climatological, and Geophysical Agency (BMKG) of the Republic of Indonesia, the latest outcomes in air quality conditions in Jakarta and surrounding areas, says that PM2.5 concentrations have increased and peaked at $148~\mu \text{g}/\text{m}^{3}$ in 2022. While a classification system for this pollution is necessary and critical, the observation of PM2.5 concentrations measured through the BMKG Kemayoran station, Jakarta, turns out to be identified as an unbalanced data class. Thus, in this work, we perform boosting algorithm supervised learning to handle such an unbalanced classification toward PM2.5 concentration levels by observing meteorological patterns in Jakarta during 1 January 2015 to 7 July 2022. The boosting algorithms considered in this research include Adaptive Boosting (AdaBoost), Extreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM). Our simulations have proven that boosting classification can significantly reduce bias in combination with variance reduction with unbalanced within-class coefficients, with the classification of PM2.5 class values: good 62%, moderate 34%, and unhealthy 59%, respectively.