Open Physics (Mar 2020)
Diagnostic model of low visibility events based on C4.5 algorithm
Abstract
In this study the low visibility in Nanjing city is classified and predicted using observed data during 2014 to 2016 with machine-learning based decision tree algorithm (4.5). For this purpose, the model was trained with 3/4th of the data samples until the self-learning accuracy of the model reached 88.32%. The remaining 1/4th of the data samples were used to verify the model’s prediction ability, with the test accuracy reaching 88.34% indicating a good classification diagnosis effect of the model. The results produced with model, generated through learning from the training sample, it is found that the relative humidity, PM10 and PM2.5 are important factors in diagnosing “whether low visibility events will occur in Nanjing”: When relative humidity is favorable (i.e. <90%) and PM2.5 concentration is not high enough (i.e. <146), the probability of low visibility events may reduce; when relative humidity is relatively favorable (i.e. ≥ 90%) with a PM10 concentration ≥ 59, low visibility events are more likely to occur; when relative humidity is extremely favorable (i.e. ≥ 96%) with a low PM10 concentration (i.e. <59), there is also a high probability that low visibility events will occur.
Keywords