Atmosphere (Mar 2025)

Predictive Model with Machine Learning for Environmental Variables and PM<sub>2.5</sub> in Huachac, Junín, Perú

  • Emery Olarte,
  • Jhonatan Gutierrez,
  • Gwayne Roque,
  • Juan J. Soria,
  • Hugo Fernandez,
  • Jackson Edgardo Pérez Carpio,
  • Orlando Poma

DOI
https://doi.org/10.3390/atmos16030323
Journal volume & issue
Vol. 16, no. 3
p. 323

Abstract

Read online

PM2.5 pollution is increasing, causing health problems. The objective of this study was to model the behavior of PM2.5AQI (air quality index) using machine learning (ML) predictive models of linear regression, lasso, ridge, and elastic net. A total of 16,543 records from the Huachac, Junin area in Peru were used with regressors of humidity in % and temperature in °C. The focus of this study is PM2.5AQI and environmental variables. Methods: Exploratory data analysis (EDA) and machine learning predictive models were applied. Results: PM2.5AQI has high values in winter and spring, with averages of 52.6 and 36.9, respectively, and low values in summer, with a maximum value in September (spring) and a minimum in February (summer). The use of regression models produced precise metrics to choose the best model for the prediction of PM2.5AQI. Comparison with other research highlights the robustness of the chosen ML models, underlining the potential of ML in PM2.5AQI. Conclusions: The predictive model found was α = 0.1111111 and a Lambda value λ = 0.150025, represented by PM2.5AQI = 83.0846522 − 10.302222000 (Humidity) − 0.1268124 (Temperature). The model has an adjusted R2 of 0.1483206 and an RMSE of 25.36203, and it allows decision making in the care of the environment.

Keywords