Applications in Plant Sciences (Jun 2020)

Generating segmentation masks of herbarium specimens and a data set for training segmentation models using deep learning

  • Alexander E. White,
  • Rebecca B. Dikow,
  • Makinnon Baugh,
  • Abigail Jenkins,
  • Paul B. Frandsen

DOI
https://doi.org/10.1002/aps3.11352
Journal volume & issue
Vol. 8, no. 6
pp. n/a – n/a

Abstract

Read online

Premise Digitized images of herbarium specimens are highly diverse with many potential sources of visual noise and bias. The systematic removal of noise and minimization of bias must be achieved in order to generate biological insights based on the plants rather than the digitization and mounting practices involved. Here, we develop a workflow and data set of high‐resolution image masks to segment plant tissues in herbarium specimen images and remove background pixels using deep learning. Methods and Results We generated 400 curated, high‐resolution masks of ferns using a combination of automatic and manual tools for image manipulation. We used those images to train a U‐Net‐style deep learning model for image segmentation, achieving a final Sørensen–Dice coefficient of 0.96. The resulting model can automatically, efficiently, and accurately segment massive data sets of digitized herbarium specimens, particularly for ferns. Conclusions The application of deep learning in herbarium sciences requires transparent and systematic protocols for generating training data so that these labor‐intensive resources can be generalized to other deep learning applications. Segmentation ground‐truth masks are hard‐won data, and we share these data and the model openly in the hopes of furthering model training and transfer learning opportunities for broader herbarium applications.

Keywords