Machine learning methods to predict particulate matter PM2.5 [version 1; peer review: 2 approved]

Naveen Palanichamy; Su-Cheng Haw; Subramanian S; Kuhaneswaran Govindasamy; Rishanti Murugan

F1000Research (Apr 2022)

Machine learning methods to predict particulate matter PM2.5 [version 1; peer review: 2 approved]

Naveen Palanichamy,
Su-Cheng Haw,
Subramanian S,
Kuhaneswaran Govindasamy,
Rishanti Murugan

Affiliations

Naveen Palanichamy: Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia
Su-Cheng Haw: ORCiD; Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia
Subramanian S: ORCiD; Department of Electrical Engineering, Annamalai University, India, Chidambaram, Tamil Nadu, 608002, India
Kuhaneswaran Govindasamy: Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia
Rishanti Murugan: Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Selangor, 63100, Malaysia

Journal volume & issue: Vol. 11

Abstract

Read online

Introduction Pollution of air in urban cities across the world has been steadily increasing in recent years. An increasing trend in particulate matter, PM2.5, is a threat because it can lead to uncontrollable consequences like worsening of asthma and cardiovascular disease. The metric used to measure air quality is the air pollutant index (API). In Malaysia, machine learning (ML) techniques for PM2.5 have received less attention as the concentration is on predicting other air pollutants. To fill the research gap, this study focuses on correctly predicting PM2.5 concentrations in the smart cities of Malaysia by comparing supervised ML techniques, which helps to mitigate its adverse effects. Methods In this paper, ML models for forecasting PM2.5 concentrations were investigated on Malaysian air quality data sets from 2017 to 2018. The dataset was preprocessed by data cleaning and a normalization process. Next, it was reduced into an informative dataset with location and time factors in the feature extraction process. The dataset was fed into three supervised ML classifiers, which include random forest (RF), artificial neural network (ANN) and long short-term memory (LSTM). Finally, their output was evaluated using the confusion matrix and compared to identify the best model for the accurate prediction of PM2.5. Results Overall, the experimental result shows an accuracy of 97.7% was obtained by the RF model in comparison with the accuracy of ANN (61.14%) and LSTM (61.77%) in predicting PM2.5. Discussion RF performed well when compared with ANN and LSTM for the given data with minimum features. RF was able to reach good accuracy as the model learns from the random samples by using decision tree with the maximum vote on the predictions.

Published in F1000Research

ISSN: 2046-1402 (Online)
Publisher: F1000 Research Ltd
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://f1000research.com

About the journal

Abstract

Keywords