Soil Science-Informed Machine Learning

Budiman Minasny; Toshiyuki Bandai; Teamrat A. Ghezzehei; Yin-Chung Huang; Yuxin Ma; Alex B. McBratney; Wartini Ng; Sarem Norouzi; Jose Padarian; Rudiyanto; Amin Sharififar; Quentin Styc; Marliana Widyastuti

Geoderma (Dec 2024)

Soil Science-Informed Machine Learning

Budiman Minasny,
Toshiyuki Bandai,
Teamrat A. Ghezzehei,
Yin-Chung Huang,
Yuxin Ma,
Alex B. McBratney,
Wartini Ng,
Sarem Norouzi,
Jose Padarian,
Rudiyanto,
Amin Sharififar,
Quentin Styc,
Marliana Widyastuti

Affiliations

Budiman Minasny: School of Life and Environmental Sciences & Sydney Institute of Agriculture, The University of Sydney, NSW 2006, Australia
Toshiyuki Bandai: Earth and Environmental Sciences Area, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
Teamrat A. Ghezzehei: Life & Environmental Sciences Department, University of California, Merced, CA 95343, USA
Yin-Chung Huang: School of Life and Environmental Sciences & Sydney Institute of Agriculture, The University of Sydney, NSW 2006, Australia
Yuxin Ma: New South Wales Department of Climate Change, Energy, the Environment and Water, Parramatta, NSW 2150, Australia
Alex B. McBratney: School of Life and Environmental Sciences & Sydney Institute of Agriculture, The University of Sydney, NSW 2006, Australia
Wartini Ng: School of Life and Environmental Sciences & Sydney Institute of Agriculture, The University of Sydney, NSW 2006, Australia
Sarem Norouzi: Department of Agroecology, Aarhus University, 8830 Tjele, Denmark
Jose Padarian: School of Life and Environmental Sciences & Sydney Institute of Agriculture, The University of Sydney, NSW 2006, Australia
Rudiyanto: Faculty of Fisheries and Food Science, Universiti Malaysia Terengganu, 21030 Kuala Nerus, Terengganu, Malaysia
Amin Sharififar: School of Life and Environmental Sciences & Sydney Institute of Agriculture, The University of Sydney, NSW 2006, Australia
Quentin Styc: School of Life and Environmental Sciences & Sydney Institute of Agriculture, The University of Sydney, NSW 2006, Australia
Marliana Widyastuti: School of Life and Environmental Sciences & Sydney Institute of Agriculture, The University of Sydney, NSW 2006, Australia

Journal volume & issue: Vol. 452
p. 117094

Abstract

Read online

Machine learning (ML) applications in soil science have significantly increased over the past two decades, reflecting a growing trend towards data-driven research addressing soil security. This extensive application has mainly focused on enhancing predictions of soil properties, particularly soil organic carbon, and improving the accuracy of digital soil mapping (DSM). Despite these advancements, the application of ML in soil science faces challenges related to data scarcity and the interpretability of ML models. There is a need for a shift towards Soil Science-Informed ML (SoilML) models that use the power of ML but also incorporate soil science knowledge in the training process to make predictions more reliable and generalisable. This paper proposes methodologies for embedding ML models with soil science knowledge to overcome current limitations. Incorporating soil science knowledge into ML models involves using observational priors to enhance training datasets, designing model structures which reflect soil science principles, and supervising model training with soil science-informed loss functions. The informed loss functions include observational constraints, coherency rules such as regularisation to avoid overfitting, and prior or soil-knowledge constraints that incorporate existing information about the parameters or outputs. By way of illustration, we present examples from four fields: digital soil mapping, soil spectroscopy, pedotransfer functions, and dynamic soil property models. We discuss the potential to integrate process-based models for improved prediction, the use of physics-informed neural networks, limitations, and the issue of overparametrisation. These approaches improve the relevance of ML predictions in soil science and enhance the models’ ability to generalise across different scenarios while maintaining soil science principles, transparency and reliability.

Published in Geoderma

ISSN: 1872-6259 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Science
Website: https://www.sciencedirect.com/journal/geoderma

About the journal

Abstract

Keywords