Self-supervised learning of Vision Transformers for digital soil mapping using visual data

Paul Tresson; Maxime Dumont; Marc Jaeger; Frédéric Borne; Stéphane Boivin; Loïc Marie-Louise; Jérémie François; Hassan Boukcim; Hervé Goëau

Geoderma (Oct 2024)

Self-supervised learning of Vision Transformers for digital soil mapping using visual data

Paul Tresson,
Maxime Dumont,
Marc Jaeger,
Frédéric Borne,
Stéphane Boivin,
Loïc Marie-Louise,
Jérémie François,
Hassan Boukcim,
Hervé Goëau

Affiliations

Paul Tresson: AMAP, Université de Montpellier, IRD, CIRAD, CNRS, INRAE, Montpellier, France; CIRAD, UMR AMAP, Montpellier, France; AMAP, CIRAD, CNRS, INRAE, IRD, Université de Montpellier, Montpellier, France; Valorhiz, Montpellier, France; Corresponding author.
Maxime Dumont: Valorhiz, Montpellier, France; Univ. Montpellier, ITAP, Montpellier, France; UMR ITAP, Institut Agro, INRAE, Montpellier, France
Marc Jaeger: CIRAD, UMR AMAP, Montpellier, France; AMAP, CIRAD, CNRS, INRAE, IRD, Université de Montpellier, Montpellier, France
Frédéric Borne: CIRAD, UMR AMAP, Montpellier, France; AMAP, CIRAD, CNRS, INRAE, IRD, Université de Montpellier, Montpellier, France
Stéphane Boivin: Valorhiz, Montpellier, France
Loïc Marie-Louise: Valorhiz, Montpellier, France
Jérémie François: Valorhiz, Montpellier, France
Hassan Boukcim: Valorhiz, Montpellier, France
Hervé Goëau: CIRAD, UMR AMAP, Montpellier, France; AMAP, CIRAD, CNRS, INRAE, IRD, Université de Montpellier, Montpellier, France

Journal volume & issue: Vol. 450
p. 117056

Abstract

Read online

In arid environments, prospecting cultivable land is challenging due to harsh climatic conditions and vast, hard-to-access areas. However, the soil is often bare, with little vegetation cover, making it easy to observe from above. Hence, remote sensing can drastically reduce costs to explore these areas. For the past few years, deep learning has extended remote sensing analysis, first with Convolutional Neural Networks (CNNs), then with Vision Transformers (ViTs). The main drawback of deep learning methods is their reliance on large calibration datasets, as data collection is a cumbersome and costly task, particularly in drylands. However, recent studies demonstrate that ViTs can be trained in a self-supervised manner to take advantage of large amounts of unlabelled data to pre-train models. These backbone models can then be finetuned to learn a supervised regression model with few labelled data.In our study, we trained ViTs in a self-supervised way with a 9500 km2 satellite image of dry-lands in Saudi Arabia with a spatial resolution of 1.5 m per pixel. The resulting models were used to extract features describing the bare soil and predict soil attributes (pH H2O, pH KCl, Si composition). Using only RGB data, we can accurately predict these soil properties and achieve, for instance, an RMSE of 0.40 ± 0.03 when predicting alkaline soil pH. We also assess the effectiveness of adding additional covariates, such as elevation. The pretrained models can as well be used as visual features extractors. These features can be used to automatically generate a clustered map of an area or as input of random forests models, providing a versatile way to generate maps with limited labelled data and input variables.

Published in Geoderma

ISSN: 1872-6259 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Science
Website: https://www.sciencedirect.com/journal/geoderma

About the journal

Abstract

Keywords