Comparison of methods to aggregate climate data to predict crop yield: an application to soybean

Mathilde Chen; Nicolas Guilpart; David Makowski

doi:10.1088/1748-9326/ad42b5

Environmental Research Letters (Jan 2024)

Comparison of methods to aggregate climate data to predict crop yield: an application to soybean

Mathilde Chen,
Nicolas Guilpart,
David Makowski

Affiliations

Mathilde Chen: ORCiD; Université Paris-Saclay, INRAE, AgroParisTech, UMR MIA PS , 91120 Palaiseau, France; CIRAD, UMR PHIM , F-34398 Montpellier, France; PHIM, Univ Montpellier, CIRAD, INRAE, Institut Agro, IRD , Montpellier, France
Nicolas Guilpart: ORCiD; Université Paris-Saclay, AgroParisTech, INRAE, UMR Agronomie , 91120 Palaiseau, France
David Makowski: ORCiD; Université Paris-Saclay, INRAE, AgroParisTech, UMR MIA PS , 91120 Palaiseau, France

DOI: https://doi.org/10.1088/1748-9326/ad42b5
Journal volume & issue: Vol. 19, no. 5
p. 054049

Abstract

Read online

High-dimensional climate data collected on a daily, monthly, or seasonal time step are now commonly used to predict crop yields worldwide with standard statistical models or machine learning models. Since the use of all available individual climate variables generally leads to calculation problems, over-fitting, and over-parameterization, it is necessary to aggregate the climate data used as predictors. However, there is no consensus on the best way to perform this task, and little is known about the impacts of the type of aggregation method used and of the temporal resolution of weather data on model performances. Based on historical data from 1981 to 2016 of soybean yield and climate on 3447 sites worldwide, this study compares different temporal resolutions (daily, monthly, or seasonal) and dimension reduction techniques (principal component analysis (PCA), partial least square regression, and their functional counterparts) to aggregate climate data used as inputs of machine learning and linear regression (LR) models predicting yields. Results showed that random forest models outperformed and were less sensitive to climate aggregation methods than LRs when predicting soybean yields. With our models, the use of daily climate data did not improve predictive performance compared to monthly data. Models based on PCA or averages of monthly data showed better predictive performance compared to those relying on more sophisticated dimension reduction techniques. By highlighting the high sensitivity of projected impact of climate on crop yields to the temporal resolution and aggregation of climate input data, this study reveals that model performances can be improved by choosing the most appropriate time resolution and aggregation techniques. Practical recommendations are formulated in this article based on our results.

Published in Environmental Research Letters

ISSN: 1748-9326 (Online)
Publisher: IOP Publishing
Country of publisher: United Kingdom
LCC subjects: Technology: Environmental technology. Sanitary engineering; Geography. Anthropology. Recreation: Environmental sciences; Science: Physics
Website: https://iopscience.iop.org/journal/1748-9326

About the journal

Abstract

Keywords