Methods in Ecology and Evolution (Feb 2023)
plantR: An R package and workflow for managing species records from biological collections
Abstract
Abstract Species records from biological collections are becoming increasingly available online. This unprecedented availability of records has largely supported recent studies in taxonomy, biogeography, macroecology and biodiversity conservation. Biological collections vary in their documentation and notation standards, which have changed through time. For different reasons, neither collections nor data repositories perform the editing, formatting and standardisation of the data, leaving these tasks to the final users of the species records (e.g. taxonomists, ecologists and conservationists). These tasks are challenging, particularly when working with millions of records from hundreds of biological collections. To help collection curators and final users perform those tasks, we introduce plantR, an open‐source package that provides a comprehensive toolbox to manage species records from biological collections. The package is accompanied by the proposal of a reproducible workflow to manage this type of data in taxonomy, ecology and biodiversity conservation. It is implemented in R and designed to handle relatively large datasets as fast as possible. Initially designed to handle plant species records, many of the plantR features also apply to other groups of organisms, given that the data structure is similar. The plantR workflow includes tools to (a) download records from different data repositories, (b) standardise typical fields associated with species records, (c) validate the locality, geographical coordinates, taxonomic nomenclature and species identifications, including the retrieval of duplicates across collections, and (d) summarise and export records, including the construction of species lists with vouchers. Other R packages provide tools to tackle some of the workflow steps described above. But in addition to the new tools and resources related to data standardisation and validation, the greatest strength of plantR is to provide a comprehensive and user‐friendly workflow in one single environment, performing all tasks from data retrieval to export. Thus, plantR can help researchers better assess data quality and avoid data leakage in a wide variety of studies using species records.
Keywords