A global land cover training dataset from 1984 to 2020

Radost Stanimirova; Katelyn Tarrio; Konrad Turlej; Kristina McAvoy; Sophia Stonebrook; Kai-Ting Hu; Paulo Arévalo; Eric L. Bullock; Yingtong Zhang; Curtis E. Woodcock; Pontus Olofsson; Zhe Zhu; Christopher P. Barber; Carlos M. Souza; Shijuan Chen; Jonathan A. Wang; Foster Mensah; Marco Calderón-Loor; Michalis Hadjikakou; Brett A. Bryan; Jordan Graesser; Dereje L. Beyene; Brian Mutasha; Sylvester Siame; Abel Siampale; Mark A. Friedl

doi:10.1038/s41597-023-02798-5

Scientific Data (Dec 2023)

A global land cover training dataset from 1984 to 2020

Radost Stanimirova,
Katelyn Tarrio,
Konrad Turlej,
Kristina McAvoy,
Sophia Stonebrook,
Kai-Ting Hu,
Paulo Arévalo,
Eric L. Bullock,
Yingtong Zhang,
Curtis E. Woodcock,
Pontus Olofsson,
Zhe Zhu,
Christopher P. Barber,
Carlos M. Souza,
Shijuan Chen,
Jonathan A. Wang,
Foster Mensah,
Marco Calderón-Loor,
Michalis Hadjikakou,
Brett A. Bryan,
Jordan Graesser,
Dereje L. Beyene,
Brian Mutasha,
Sylvester Siame,
Abel Siampale,
Mark A. Friedl

Affiliations

Radost Stanimirova: Department of Earth and Environment, Boston University
Katelyn Tarrio: Department of Earth and Environment, Boston University
Konrad Turlej: Department of Earth and Environment, Boston University
Kristina McAvoy: Department of Earth and Environment, Boston University
Sophia Stonebrook: Department of Earth and Environment, Boston University
Kai-Ting Hu: Department of Earth and Environment, Boston University
Paulo Arévalo: Department of Earth and Environment, Boston University
Eric L. Bullock: Department of Earth and Environment, Boston University
Yingtong Zhang: Department of Earth and Environment, Boston University
Curtis E. Woodcock: Department of Earth and Environment, Boston University
Pontus Olofsson: Department of Earth and Environment, Boston University
Zhe Zhu: Department of Natural Resources and the Environment, University of Connecticut
Christopher P. Barber: U.S. Geological Survey (USGS), Earth Resources Observation and Science (EROS) Center
Carlos M. Souza: Imazon—Amazonia People and Environment Institute
Shijuan Chen: Department of Earth and Environment, Boston University
Jonathan A. Wang: School of Biological Sciences, University of Utah
Foster Mensah: Center for Remote Sensing and Geographic Information Services, University of Ghana
Marco Calderón-Loor: School of Life and Environmental Sciences, Deakin University
Michalis Hadjikakou: School of Life and Environmental Sciences, Deakin University
Brett A. Bryan: School of Life and Environmental Sciences, Deakin University
Jordan Graesser: Indigo Ag
Dereje L. Beyene: REDD+ Coordination Unit, Oromia Environmental Protection Authority
Brian Mutasha: Forestry Department Headquarters, Ministry of Green Economy and Environment
Sylvester Siame: Forestry Department Headquarters, Ministry of Green Economy and Environment
Abel Siampale: Forestry Department Headquarters, Ministry of Green Economy and Environment
Mark A. Friedl: Department of Earth and Environment, Boston University

DOI: https://doi.org/10.1038/s41597-023-02798-5
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 12

Abstract

Read online

Abstract State-of-the-art cloud computing platforms such as Google Earth Engine (GEE) enable regional-to-global land cover and land cover change mapping with machine learning algorithms. However, collection of high-quality training data, which is necessary for accurate land cover mapping, remains costly and labor-intensive. To address this need, we created a global database of nearly 2 million training units spanning the period from 1984 to 2020 for seven primary and nine secondary land cover classes. Our training data collection approach leveraged GEE and machine learning algorithms to ensure data quality and biogeographic representation. We sampled the spectral-temporal feature space from Landsat imagery to efficiently allocate training data across global ecoregions and incorporated publicly available and collaborator-provided datasets to our database. To reflect the underlying regional class distribution and post-disturbance landscapes, we strategically augmented the database. We used a machine learning-based cross-validation procedure to remove potentially mis-labeled training units. Our training database is relevant for a wide array of studies such as land cover change, agriculture, forestry, hydrology, urban development, among many others.

Published in Scientific Data

ISSN: 2052-4463 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/sdata/

About the journal