Classification Prediction of PM<sub>10</sub> Concentration Using a Tree-Based Machine Learning Approach

Wan Nur Shaziayani; Ahmad Zia Ul-Saufie; Sofianita Mutalib; Norazian Mohamad Noor; Nazatul Syadia Zainordin

doi:10.3390/atmos13040538

Atmosphere (Mar 2022)

Classification Prediction of PM<sub>10</sub> Concentration Using a Tree-Based Machine Learning Approach

Wan Nur Shaziayani,
Ahmad Zia Ul-Saufie,
Sofianita Mutalib,
Norazian Mohamad Noor,
Nazatul Syadia Zainordin

Affiliations

Wan Nur Shaziayani: Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam 40450, Selangor, Malaysia
Ahmad Zia Ul-Saufie: Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam 40450, Selangor, Malaysia
Sofianita Mutalib: Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam 40450, Selangor, Malaysia
Norazian Mohamad Noor: Faculty of Civil Engineering Technology, Universiti Malaysia Perlis, Kompleks Pengajian Jejawi 3, Arau 02600, Perlis, Malaysia
Nazatul Syadia Zainordin: Department of Environment, Faculty of Forestry and Environment, Universiti Putra Malaysia, Seri Kembangan 43400, Selangor, Malaysia

DOI: https://doi.org/10.3390/atmos13040538
Journal volume & issue: Vol. 13, no. 4
p. 538

Abstract

Read online

The PM10 prediction has received considerable attention due to its harmful effects on human health. Machine learning approaches have the potential to predict and classify future PM10 concentrations accurately. Therefore, in this study, three machine learning algorithms—namely, decision tree (DT), boosted regression tree (BRT), and random forest (RF)—were applied for the prediction of PM10 in Kota Bharu, Kelantan. The results from these three methods were compared to find the best method to predict PM10 concentration for the next day by using the maximum daily data from January 2002 to December 2017. To this end, 80% of the data were used for training and 20% for validation of the models. The performance measure of the PM10 concentration was based on accuracy, sensitivity, specificity, and precision for RF, BRT, and DT, respectively, which indicates that these three models were developed effectively, and they are applicable in the prediction of other atmospheric environmental data. The best model to use in predicting the next day’s PM10 concentration classification was the random forest classifier, with an accuracy of 98.37, sensitivity of 97.19, specificity of 99.55, and precision of 99.54, but the result of the boosted regression tree was substantially different from the RF model, with an accuracy of 98.12, sensitivity of 97.51, specificity of 98.72, and precision of 98.71. The best model can assist local governments in providing early warnings to people who are at risk of acute and chronic health consequences from air pollution.

Published in Atmosphere

ISSN: 2073-4433 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Physics: Meteorology. Climatology
Website: http://www.mdpi.com/journal/atmosphere/

About the journal

Abstract

Keywords