Scientific Data (Sep 2024)

The 500-meter long-term winter wheat grain protein content dataset for China from multi-source data

  • Xiaobin Xu,
  • Lili Zhou,
  • James Taylor,
  • Raffaele Casa,
  • Chengzhi Fan,
  • Xiaoyu Song,
  • Guijun Yang,
  • Wenjiang Huang,
  • Zhenhai Li

DOI
https://doi.org/10.1038/s41597-024-03866-0
Journal volume & issue
Vol. 11, no. 1
pp. 1 – 13

Abstract

Read online

Abstract In China, the exigency for precise wheat grain protein content (GPC) data rises with growing food consumption demands and global market competition. However, due to the lack of extensive, prolonged high-resolution benchmark data, previous GPC studies have primarily focused on experimental fields, small geographic units, and limited temporal scopes. Additionally, the diverse geographical terrain in China exacerbates the challenges of large-scale GPC estimation. To address this challenge and the data gap, the first 500-meter spatial resolution, long-term winter wheat dataset covering major planting regions in China (CNWheatGPC-500) was created by integrating multi-source data from ERA5 and MODIS. The results demonstrate that the GPC estimation model based on hierarchical linear model significantly outperformed other conventional models. The validation dataset exhibited an R2 of 0.45 and an RMSE of 0.96%. In cross-validation, the RMSE values ranged from 0.90% in Gansu to 1.32% in Anhui. For leave-one-year-out cross-validation, the RMSE values ranged from 0.77% to 1.11%. CNWheatGPC-500 offers valuable insights for enhancing wheat production, quality control, and agricultural decision-making.