Data science competition for cross-site individual tree species identification from airborne remote sensing data

Sarah J. Graves; Sergio Marconi; Dylan Stewart; Ira Harmon; Ben Weinstein; Yuzi Kanazawa; Victoria M. Scholl; Maxwell B. Joseph; Joseph McGlinchy; Luke Browne; Megan K. Sullivan; Sergio Estrada-Villegas; Daisy Zhe Wang; Aditya Singh; Stephanie Bohlman; Alina Zare; Ethan P. White

doi:10.7717/peerj.16578

PeerJ (Dec 2023)

Data science competition for cross-site individual tree species identification from airborne remote sensing data

Sarah J. Graves,
Sergio Marconi,
Dylan Stewart,
Ira Harmon,
Ben Weinstein,
Yuzi Kanazawa,
Victoria M. Scholl,
Maxwell B. Joseph,
Joseph McGlinchy,
Luke Browne,
Megan K. Sullivan,
Sergio Estrada-Villegas,
Daisy Zhe Wang,
Aditya Singh,
Stephanie Bohlman,
Alina Zare,
Ethan P. White

Affiliations

Sarah J. Graves: Nelson Institute for Environmental Studies, University of Wisconsin-Madison, Madison, Wisconsin, United States
Sergio Marconi: Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, Florida, United States
Dylan Stewart: Department of Electrical and Computer Engineering, University of Florida, Gainesville, Florida, United States
Ira Harmon: Department of Computer and Information Sciences and Engineering, University of Florida, Gainesville, Florida, United States
Ben Weinstein: Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, Florida, United States
Yuzi Kanazawa: Artificial Intelligence Laboratory, Fujitsu Laboratories Ltd., Kawasaki, Kanagawa, Japan
Victoria M. Scholl: Earth Lab, Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado at Boulder, Boulder, Colorado, United States
Maxwell B. Joseph: Earth Lab, Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado at Boulder, Boulder, Colorado, United States
Joseph McGlinchy: Earth Lab, Cooperative Institute for Research in Environmental Sciences (CIRES), University of Colorado at Boulder, Boulder, Colorado, United States
Luke Browne: Yale School of the Environment, Yale University, New Haven, Connecticut, United States
Megan K. Sullivan: Yale School of the Environment, Yale University, New Haven, Connecticut, United States
Sergio Estrada-Villegas: Yale School of the Environment, Yale University, New Haven, Connecticut, United States
Daisy Zhe Wang: Department of Computer and Information Sciences and Engineering, University of Florida, Gainesville, Florida, United States
Aditya Singh: Department of Agricultural & Biological Engineering, University of Florida, Gainesville, Florida, United States
Stephanie Bohlman: School of Forest, Fisheries, and Geomatics Sciences, University of Florida, Gainesville, Florida, United States
Alina Zare: Department of Electrical and Computer Engineering, University of Florida, Gainesville, Florida, United States
Ethan P. White: Department of Wildlife Ecology and Conservation, University of Florida, Gainesville, Florida, United States

DOI: https://doi.org/10.7717/peerj.16578
Journal volume & issue: Vol. 11
p. e16578

Abstract

Read online Read online

Data on individual tree crowns from remote sensing have the potential to advance forest ecology by providing information about forest composition and structure with a continuous spatial coverage over large spatial extents. Classifying individual trees to their taxonomic species over large regions from remote sensing data is challenging. Methods to classify individual species are often accurate for common species, but perform poorly for less common species and when applied to new sites. We ran a data science competition to help identify effective methods for the task of classification of individual crowns to species identity. The competition included data from three sites to assess each methods’ ability to generalize patterns across two sites simultaneously and apply methods to an untrained site. Three different metrics were used to assess and compare model performance. Six teams participated, representing four countries and nine individuals. The highest performing method from a previous competition in 2017 was applied and used as a baseline to understand advancements and changes in successful methods. The best species classification method was based on a two-stage fully connected neural network that significantly outperformed the baseline random forest and gradient boosting ensemble methods. All methods generalized well by showing relatively strong performance on the trained sites (accuracy = 0.46–0.55, macro F1 = 0.09–0.32, cross entropy loss = 2.4–9.2), but generally failed to transfer effectively to the untrained site (accuracy = 0.07–0.32, macro F1 = 0.02–0.18, cross entropy loss = 2.8–16.3). Classification performance was influenced by the number of samples with species labels available for training, with most methods predicting common species at the training sites well (maximum F1 score of 0.86) relative to the uncommon species where none were predicted. Classification errors were most common between species in the same genus and different species that occur in the same habitat. Most methods performed better than the baseline in detecting if a species was not in the training data by predicting an untrained mixed-species class, especially in the untrained site. This work has highlighted that data science competitions can encourage advancement of methods, particularly by bringing in new people from outside the focal discipline, and by providing an open dataset and evaluation criteria from which participants can learn.

Published in PeerJ

ISSN: 2167-8359 (Online)
Publisher: PeerJ Inc.
Country of publisher: United States
LCC subjects: Medicine; Science: Biology (General)
Website: https://peerj.com/

About the journal

Abstract

Keywords