Assessing the Generalization Capacity of Convolutional Neural Networks and Vision Transformers for Deforestation Detection in Tropical Biomes

P. J. Soto Vega; D. Lobo Torres; G. X. Andrade-Miranda; G. A. O. P. da Costa; R. Q. Feitosa

doi:10.5194/isprs-archives-XLVIII-3-2024-519-2024

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences (Nov 2024)

Assessing the Generalization Capacity of Convolutional Neural Networks and Vision Transformers for Deforestation Detection in Tropical Biomes

P. J. Soto Vega,
D. Lobo Torres,
G. X. Andrade-Miranda,
G. A. O. P. da Costa,
R. Q. Feitosa

Affiliations

P. J. Soto Vega: L@bISEN, Vision-AD and Auto-ROB, ISEN Yncréa Ouest, 20 rue Cuirassé Bretagne, 29200 Brest, France
D. Lobo Torres: Dept. of Electrical Engineering, Computer Vision Laboratory, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil
G. X. Andrade-Miranda: University Brest, LaTIM, INSERM UMR 1101, Brest, France
G. A. O. P. da Costa: Institute of Mathematics and Statistics, State University of Rio de Janeiro (UERJ), Rio de Janeiro, Brazil
R. Q. Feitosa: Dept. of Electrical Engineering, Computer Vision Laboratory, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro, Brazil

DOI: https://doi.org/10.5194/isprs-archives-XLVIII-3-2024-519-2024
Journal volume & issue: Vol. XLVIII-3-2024
pp. 519 – 525

Abstract

Read online

Deep Learning (DL) models, such as Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs), have become popular for change detection tasks, including the deforestation mapping application. However, not enough attention has been paid to the domain shift issue, which affects classification performance when pre-trained models are used in areas with different forest covers and deforestation practices. This study compares DL methods for deforestation detection, focusing on assessing how well CNNs and ViTs can adapt to the domain shift. Two different models, namely, DeepLabv3+ and UNETR, were trained using remote sensing images and references from a specific location and then tested in other sites to simulate real-world scenarios. The results showed that the ViT-based architecture achieved better performance when trained and tested in the same region but showed lower generalization capacity in cross-domain scenarios. We consider this a work in progress that needs further research to confirm its findings, with the evaluation of additional architectures on a wider range of domains.

Published in The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

ISSN: 1682-1750 (Print); 2194-9034 (Online)
Publisher: Copernicus Publications
Country of publisher: Germany
LCC subjects: Technology: Engineering (General). Civil engineering (General): Applied optics. Photonics
Website: http://www.isprs.org/publications/archives.aspx

About the journal