Semi-supervised urban haze pollution prediction based on multi-source heterogeneous data

Zuhan Liu; Lili Wang

Heliyon (Jun 2024)

Semi-supervised urban haze pollution prediction based on multi-source heterogeneous data

Zuhan Liu,
Lili Wang

Affiliations

Zuhan Liu: School of Information Engineering, Nanchang Institute of Technology, Nanchang, China; Corresponding author.
Lili Wang: College of Science, Nanchang Institute of Technology, Nanchang, China

Journal volume & issue: Vol. 10, no. 12
p. e33332

Abstract

Read online

Particulate matter (PM) is defined by the Texas Commission on Environmental Quality (TCEQ) as “a mixture of solid particles and liquid droplets found in the air”. These particles vary widely in size. Those particles that are less than 2.5 μm in aerodynamic diameter are known as Particulate Matter 2.5 or PM2.5. Urban haze pollution represented by PM2.5 is becoming serious, so air pollution monitoring is very important. However, due to high cost, the number of air monitoring stations is limited. Our work focuses on integrating multi-source heterogeneous data of Nanchang, China, which includes Taxi track, human mobility, Road networks, Points of Interest (POIs), Meteorology (e.g., temperature, dew point, humidity, wind speed, wind direction, atmospheric pressure, weather activity, weather conditions) and PM2.5 forecast data of air monitoring stations. This research presents an innovative approach to air quality prediction by integrating the above data sets from various sources and utilizing diverse architectures in Nanchang City, China. So for that, semi-supervised learning techniques will be used, namely collaborative training algorithm Co-Training (Co-T), who further adjusting algorithm Tri-Training (Tri-T). The objective is to accurately estimate haze pollution by integrating and using these multi-source heterogeneous data. We achieved this for the first time by employing a semi-supervised co-training strategy to accurately estimate pollution levels after applying the U-air system to environmental data. In particular, the algorithm of U-Air system is reproduced on these highly diverse heterogeneous data of Nanchang City, and the semi-supervised learning Co-T and Tri-T are used to conduct more detailed urban haze pollution prediction. Compared with Co-T, which train time classifier (TC) and subspace classifier (SC) respectively from the separated spatio-temporal perspective, the Tri-T is more accurate with a and faster because of its testing accuracy up to 85.62 %. The forecast results also present the potential of the city multi-source heterogeneous data and the effectiveness of the semi-supervised learning. We hope that this synthesis will motivate atmospheric environmental officials, scientists, and environmentalists in China to explore machine learning technology for controlling the discharge of pollutants and environmental management.

Published in Heliyon

ISSN: 2405-8440 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General); Social Sciences: Social sciences (General)
Website: https://www.cell.com/heliyon/home

About the journal

Abstract

Keywords