Data (Oct 2024)

Towards a Taxonomy Machine: A Training Set of 5.6 Million Arthropod Images

  • Dirk Steinke,
  • Sujeevan Ratnasingham,
  • Jireh Agda,
  • Hamzah Ait Boutou,
  • Isaiah C. H. Box,
  • Mary Boyle,
  • Dean Chan,
  • Corey Feng,
  • Scott C. Lowe,
  • Jaclyn T. A. McKeown,
  • Joschka McLeod,
  • Alan Sanchez,
  • Ian Smith,
  • Spencer Walker,
  • Catherine Y.-Y. Wei,
  • Paul D. N. Hebert

DOI
https://doi.org/10.3390/data9110122
Journal volume & issue
Vol. 9, no. 11
p. 122

Abstract

Read online

The taxonomic identification of organisms from images is an active research area within the machine learning community. Current algorithms are very effective for object recognition and discrimination, but they require extensive training datasets to generate reliable assignments. This study releases 5.6 million images with representatives from 10 arthropod classes and 26 insect orders. All images were taken using a Keyence VHX-7000 Digital Microscope system with an automatic stage to permit high-resolution (4K) microphotography. Providing phenotypic data for 324,000 species derived from 48 countries, this release represents, by far, the largest dataset of standardized arthropod images. As such, this dataset is well suited for testing the efficacy of machine learning algorithms for identifying specimens into higher taxonomic categories.

Keywords