Scientific Data (Nov 2023)
A globally synthesised and flagged bee occurrence dataset and cleaning workflow
- James B. Dorey,
- Erica E. Fischer,
- Paige R. Chesshire,
- Angela Nava-Bolaños,
- Robert L. O’Reilly,
- Silas Bossert,
- Shannon M. Collins,
- Elinor M. Lichtenberg,
- Erika M. Tucker,
- Allan Smith-Pardo,
- Armando Falcon-Brindis,
- Diego A. Guevara,
- Bruno Ribeiro,
- Diego de Pedro,
- John Pickering,
- Keng-Lou James Hung,
- Katherine A. Parys,
- Lindsie M. McCabe,
- Matthew S. Rogan,
- Robert L. Minckley,
- Santiago J. E. Velazco,
- Terry Griswold,
- Tracy A. Zarrillo,
- Walter Jetz,
- Yanina V. Sica,
- Michael C. Orr,
- Laura Melissa Guzman,
- John S. Ascher,
- Alice C. Hughes,
- Neil S. Cobb
Affiliations
- James B. Dorey
- College of Science and Engineering, Flinders University
- Erica E. Fischer
- Centre for the History of Science, Technology, and Medicine, Department of History, King’s College London, Strand
- Paige R. Chesshire
- Department of Biological Sciences, Northern Arizona University
- Angela Nava-Bolaños
- Unidad Multidisciplinaria de Docencia e Investigación, Facultad de Ciencias, Campus Juriquilla, Universidad Nacional Autónoma de México, Boulevard Juriquilla, Jurica La Mesa, Juriquilla
- Robert L. O’Reilly
- College of Science and Engineering, Flinders University
- Silas Bossert
- Department of Entomology, Washington State University
- Shannon M. Collins
- Department of Biological Sciences and Advanced Environmental Research Institute, University of North Texas
- Elinor M. Lichtenberg
- Department of Biological Sciences and Advanced Environmental Research Institute, University of North Texas
- Erika M. Tucker
- Biodiversity Outreach Network
- Allan Smith-Pardo
- Animal Plant Health Inspection Service (APHIS); Plant Protection and Quarantine (PPQ); Science and Technology (S&T); Pest Identification Technology laboratory (PITL) United States Department of Agriculture (USDA), St. Suite
- Armando Falcon-Brindis
- Department of Entomology, Research and Education Center, University of Kentucky, University Dr
- Diego A. Guevara
- Departamento de Biología, Universidad Nacionalde Colombia
- Bruno Ribeiro
- Programa de Pós-graduação em Ecologia e Evolução, Universidade Federal de Goiás, Goiânia, Av
- Diego de Pedro
- Ensenada Center for Scientific Research and Higher Education, Carr. Tijuana-Ensenada, Zona Playitas
- John Pickering
- Discover Life
- Keng-Lou James Hung
- Oklahoma Biological Survey, University of Oklahoma
- Katherine A. Parys
- USDA ARS Pollinator Health in Southern Crop Ecosystems Research Unit
- Lindsie M. McCabe
- USDA-ARS Pollinating Insects-Research Unit
- Matthew S. Rogan
- Center for Biodiversity and Global Change, Yale University
- Robert L. Minckley
- Department of Biology, University of Rochester
- Santiago J. E. Velazco
- Instituto de Biología Subtropical, Consejo Nacional de Investigaciones Científicas y Técnicas, Universidad Nacional de Misiones
- Terry Griswold
- USDA-ARS Pollinating Insects-Research Unit
- Tracy A. Zarrillo
- The Connecticut Agricultural Experiment Station
- Walter Jetz
- Center for Biodiversity and Global Change, Yale University
- Yanina V. Sica
- Center for Biodiversity and Global Change, Yale University
- Michael C. Orr
- Entomologie, Staatliches Museum für Naturkunde Stuttgart, Rosenstein, Stuttgart
- Laura Melissa Guzman
- Marine and Environmental Biology, Department of Biological Sciences, University of Southern California
- John S. Ascher
- Department of Biological Sciences, National University of Singapore, Science Dr
- Alice C. Hughes
- School of Biological Sciences, University of Hong Kong
- Neil S. Cobb
- Biodiversity Outreach Network
- DOI
- https://doi.org/10.1038/s41597-023-02626-w
- Journal volume & issue
-
Vol. 10,
no. 1
pp. 1 – 17
Abstract
Abstract Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC, a new R package, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDC R-workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and, we added record-level flags for a series of potential quality issues. These data are provided in two formats, “cleaned” and “flagged-but-uncleaned”. The BeeBDC package with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducible R workflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation.