Improving global soil moisture prediction through cluster-averaged sampling strategy
Qingliang Li,
Qiyun Xiao,
Cheng Zhang,
Jinlong Zhu,
Xiao Chen,
Yuguang Yan,
Pingping Liu,
Wei Shangguan,
Zhongwang Wei,
Lu Li,
Wenzong Dong,
Yongjiu Dai
Affiliations
Qingliang Li
College of Computer Science and Technology, Changchun Normal University, Changchun 130032, China; Research Institute for Scientific and Technological Innovation, Changchun Normal University, Changchun 130032, China; Corresponding author at: College of Computer Science and Technology, Changchun Normal University, Changchun 130032, China.
Qiyun Xiao
College of Computer Science and Technology, Changchun Normal University, Changchun 130032, China
Cheng Zhang
College of Computer Science and Technology, Jilin University, Changchun 130032, China
Jinlong Zhu
College of Computer Science and Technology, Changchun Normal University, Changchun 130032, China; Research Institute for Scientific and Technological Innovation, Changchun Normal University, Changchun 130032, China
Xiao Chen
College of Computer Science and Technology, Changchun Normal University, Changchun 130032, China; Research Institute for Scientific and Technological Innovation, Changchun Normal University, Changchun 130032, China
Yuguang Yan
College of Computer Science and Technology, Changchun Normal University, Changchun 130032, China; Research Institute for Scientific and Technological Innovation, Changchun Normal University, Changchun 130032, China
Pingping Liu
College of Computer Science and Technology, Jilin University, Changchun 130032, China
Wei Shangguan
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), and Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
Zhongwang Wei
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), and Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
Lu Li
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), and Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
Wenzong Dong
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), and Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
Yongjiu Dai
Southern Marine Science and Engineering Guangdong Laboratory (Zhuhai), and Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-Sen University, Guangzhou, Guangdong, China
Understanding and predicting global soil moisture (SM) is crucial for water resource management and agricultural production. While deep learning methods (DL) have shown strong performance in SM prediction, imbalances in training samples with different characteristics pose a significant challenge. We propose that improving the diversity and balance of batch training samples during gradient descent can help address this issue. To test this hypothesis, we developed a Cluster-Averaged Sampling (CAS) strategy utilizing unsupervised learning techniques. This approach involves training the model with evenly sampled data from different clusters, ensuring both sample diversity and numerical consistency within each cluster. This approach prevents the model from overemphasizing specific sample characteristics, leading to more balanced feature learning. Experiments using the LandBench1.0 dataset with five different seeds for 1-day lead-time global predictions reveal that CAS outperforms several Long Short-Term Memory (LSTM)-based models that do not employ this strategy. The median Coefficient of Determination (R2) improved by 2.36 % to 4.31 %, while Kling-Gupta Efficiency (KGE) improved by 1.95 % to 3.16 %. In high-latitude areas, R2 improvements exceeded 40 % in specific regions. To further validate CAS under realistic conditions, we tested it using the Soil Moisture Active and Passive Level 3 (SMAP-L3) satellite data for 1 to 3-day lead-time global predictions, confirming its efficacy. The study substantiates the CAS strategy and introduces a novel training method for enhancing the generalization of DL models.