Scientific Data (Mar 2025)
Mapping 1-km soybean yield across China from 2001 to 2020 based on ensemble learning
Abstract
Abstract Soybean is a critical agricultural product in China, with domestic production unable to satisfy the substantial demand, leading to a huge reliance on imports. To support the scientific formulation of agricultural policies and the optimization of domestic planting structures, we developed a high-resolution annual soybean yield dataset for China (2001–2020), ChinaSoyYield1km. This dataset was generated by applying ensemble learning algorithms and spatial decomposition to a comprehensive set of multi-source data, including climate variables, remote sensing imagery, soil properties, agricultural management practices, and official yield records. The integration of these diverse datasets allows for a nuanced understanding of the factors influencing soybean yield at a 1-km resolution. The resulting dataset captures over 50% of the yield variability at the county scale, demonstrating superior accuracy compared to publicly available datasets with reductions in Root Mean Square Error (RMSE) ranging from 0.18 to 0.60 t/ha. It is anticipated that our dataset will enhance agricultural studies, planning, and policy-making related to soybean cultivation, providing a valuable resource for both the scientific community and government.