Ecological Indicators (Sep 2024)

Fine-scale population mapping on Tibetan Plateau using the ensemble machine learning methods and multisource data

  • Huiming Zhang,
  • Jingqiao Fu,
  • Feixiang Li,
  • Qian Chen,
  • Tao Ye,
  • Yili Zhang,
  • Xuchao Yang

Journal volume & issue
Vol. 166
p. 112307

Abstract

Read online

The Tibetan Plateau, known for its high elevation and sparse population distribution, heavily depends on gridded population data to enhance disaster prevention and management strategies. This study utilizes multi-source physical geographic and socio-economic factors to delineate the population distribution across the plateau. Using data from the seventh National Census in 2020, we apply three individual machine learning methods (Random Forest, GBDT, and XGBoost) and two multi-model ensemble methods (weighted average ensemble and stacking ensemble) to spatialize the population data into a 100-meter grid. The results reveal that the spatialization accuracy of all models exceeds that of the WorldPop dataset. Specifically, the Random Forest model (RMSE = 4061.09, nRMSE = 44.71 %) and the stacking ensemble model (RMSE = 4094.47, nRMSE = 44.26 %) demonstrate the highest accuracy among the individual and ensemble models, respectively. Emphasizing the importance of integrating multi-source big data, Tencent location-based services data emerges as a crucial variable across all models. This study highlights the effectiveness of ensemble models and multi-source big data in improving population mapping accuracy, especially in regions with complex terrains.

Keywords