Logi (Jan 2024)

Classification of Particulate Matter (PM2.5) Concentrations Using Feature Selection and Machine Learning Strategies

  • Matara Caroline Mongina,
  • Nyambane Simpson Osano,
  • Yusuf Amir Okeyo,
  • Ochungo Elisha Akech,
  • Khattak Afaq

DOI
https://doi.org/10.2478/logi-2024-0008
Journal volume & issue
Vol. 15, no. 1
pp. 85 – 96

Abstract

Read online

This research employed machine learning approaches to classify acceptable or non-acceptable particulate matter (PM2.5) concentrations using a dataset that was obtained from the Nairobi expressway road corridor. The dataset contained air quality data, traffic volume, and meteorological data. The Boruta Algorithm (BA) was utilized in conjunction with the Random Forests (RF) classifier to select the most appropriate features from the dataset. The findings of the BA analysis indicated that humidity was the most influential factor in determining air quality. This was closely followed by the variables of ‘day_of_week’ and the volume of traffic bound for Nairobi. The temperature of the site was determined to have a lower significance. The comparison among different machine learning classifiers for the classification of acceptable and unacceptable PM2.5 concentrations revealed that the Extreme Gradient Boosting (XGBoost) classifier displayed superior performance in terms of Sensitivity (0.774), Specificity (0.943), F1-Score (0.833), and AU-ROC (0.874). The Binary Logistic Regression (BLR) model demonstrated comparatively poorer performance in terms of Sensitivity (0.244), Specificity (0.614), F1-Score (0.455), and AU-ROC (0.508) when compared to other ML models. The prediction of PM2.5 has the potential to provide valuable insights to transport policymakers in their deliberations on urban transport policy formulation.

Keywords