Scientific Data (Mar 2025)

Mapping 1-km soybean yield across China from 2001 to 2020 based on ensemble learning

  • Min Zhang,
  • Xinlei Xu,
  • Junji Ou,
  • Zengguang Zhang,
  • Fangzheng Chen,
  • Lijie Shi,
  • Bin Wang,
  • Meiqin Zhang,
  • Liang He,
  • Xueliang Zhang,
  • Yong Chen,
  • Kelin Hu,
  • Puyu Feng

DOI
https://doi.org/10.1038/s41597-025-04738-x
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Soybean is a critical agricultural product in China, with domestic production unable to satisfy the substantial demand, leading to a huge reliance on imports. To support the scientific formulation of agricultural policies and the optimization of domestic planting structures, we developed a high-resolution annual soybean yield dataset for China (2001–2020), ChinaSoyYield1km. This dataset was generated by applying ensemble learning algorithms and spatial decomposition to a comprehensive set of multi-source data, including climate variables, remote sensing imagery, soil properties, agricultural management practices, and official yield records. The integration of these diverse datasets allows for a nuanced understanding of the factors influencing soybean yield at a 1-km resolution. The resulting dataset captures over 50% of the yield variability at the county scale, demonstrating superior accuracy compared to publicly available datasets with reductions in Root Mean Square Error (RMSE) ranging from 0.18 to 0.60 t/ha. It is anticipated that our dataset will enhance agricultural studies, planning, and policy-making related to soybean cultivation, providing a valuable resource for both the scientific community and government.