Atmosphere (Jun 2021)

K-Means and C4.5 Decision Tree Based Prediction of Long-Term Precipitation Variability in the Poyang Lake Basin, China

  • Dan Lou,
  • Mengxi Yang,
  • Dawei Shi,
  • Guojie Wang,
  • Waheed Ullah,
  • Yuanfang Chai,
  • Yutian Chen

DOI
https://doi.org/10.3390/atmos12070834
Journal volume & issue
Vol. 12, no. 7
p. 834

Abstract

Read online

The machine learning algorithms application in atmospheric sciences along the Earth System Models has the potential of improving prediction, forecast, and reconstruction of missing data. In the current study, a combination of two machine learning techniques namely K-means, and decision tree (C4.5) algorithms, are used to separate observed precipitation into clusters and classified the associated large-scale circulation indices. Observed precipitation from the Chinese Meteorological Agency (CMA) during 1961–2016 for 83 stations in the Poyang Lake basin (PLB) is used. The results from K-Means clusters show two precipitation clusters splitting the PLB precipitation into a northern and southern cluster, with a silhouette coefficient ~0.5. The PLB precipitation leading cluster (C1) contains 48 stations accounting for 58% of the regional station density, while Cluster 2 (C2) covers 35, accounting for 42% of the stations. The interannual variability in precipitation exhibited significant differences for both clusters. The decision tree (C4.5) is employed to explore the large-scale atmospheric indices from National Climate Center (NCC) associated with each cluster during the preceding spring season as a predictor. The C1 precipitation was linked with the location and intensity of subtropical ridgeline position over Northern Africa, whereas the C2 precipitation was suggested to be associated with the Atlantic-European Polar Vortex Area Index. The precipitation anomalies further validated the results of both algorithms. The findings are in accordance with previous studies conducted globally and hence recommend the applications of machine learning techniques in atmospheric science on a sub-regional and sub-seasonal scale. Future studies should explore the dynamics of the K-Means, and C4.5 derived indicators for a better assessment on a regional scale. This research based on machine learning methods may bring a new solution to climate forecast.

Keywords