Recommending plant taxa for supporting on-site species identification

Hans Christian Wittich; Marco Seeland; Jana Wäldchen; Michael Rzanny; Patrick Mäder

doi:10.1186/s12859-018-2201-7

BMC Bioinformatics (May 2018)

Recommending plant taxa for supporting on-site species identification

Hans Christian Wittich,
Marco Seeland,
Jana Wäldchen,
Michael Rzanny,
Patrick Mäder

Affiliations

Hans Christian Wittich: Institute for Computer and Systems Engineering, Technische Universität Ilmenau
Marco Seeland: Institute for Computer and Systems Engineering, Technische Universität Ilmenau
Jana Wäldchen: Department Biogeochemical Integration, Max-Planck-Institute for Biogeochemistry
Michael Rzanny: Department Biogeochemical Integration, Max-Planck-Institute for Biogeochemistry
Patrick Mäder: Institute for Computer and Systems Engineering, Technische Universität Ilmenau

DOI: https://doi.org/10.1186/s12859-018-2201-7
Journal volume & issue: Vol. 19, no. 1
pp. 1 – 17

Abstract

Read online

Abstract Background Predicting a list of plant taxa most likely to be observed at a given geographical location and time is useful for many scenarios in biodiversity informatics. Since efficient plant species identification is impeded mainly by the large number of possible candidate species, providing a shortlist of likely candidates can help significantly expedite the task. Whereas species distribution models heavily rely on geo-referenced occurrence data, such information still remains largely unused for plant taxa identification tools. Results In this paper, we conduct a study on the feasibility of computing a ranked shortlist of plant taxa likely to be encountered by an observer in the field. We use the territory of Germany as case study with a total of 7.62M records of freely available plant presence-absence data and occurrence records for 2.7k plant taxa. We systematically study achievable recommendation quality based on two types of source data: binary presence-absence data and individual occurrence records. Furthermore, we study strategies for aggregating records into a taxa recommendation based on location and date of an observation. Conclusion We evaluate recommendations using 28k geo-referenced and taxa-labeled plant images hosted on the Flickr website as an independent test dataset. Relying on location information from presence-absence data alone results in an average recall of 82%. However, we find that occurrence records are complementary to presence-absence data and using both in combination yields considerably higher recall of 96% along with improved ranking metrics. Ultimately, by reducing the list of candidate taxa by an average of 62%, a spatio-temporal prior can substantially expedite the overall identification problem.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords