How many specimens make a sufficient training set for automated three-dimensional feature extraction?

James M. Mulqueeney; Alex Searle-Barnes; Anieke Brombacher; Marisa Sweeney; Anjali Goswami; Thomas H. G. Ezard

doi:10.1098/rsos.240113

Royal Society Open Science (Jun 2024)

How many specimens make a sufficient training set for automated three-dimensional feature extraction?

James M. Mulqueeney,
Alex Searle-Barnes,
Anieke Brombacher,
Marisa Sweeney,
Anjali Goswami,
Thomas H. G. Ezard

Affiliations

James M. Mulqueeney: School of Ocean & Earth Science, National Oceanography Centre Southampton, University of Southampton Waterfront Campus , Southampton, UK
Alex Searle-Barnes: School of Ocean & Earth Science, National Oceanography Centre Southampton, University of Southampton Waterfront Campus , Southampton, UK
Anieke Brombacher: School of Ocean & Earth Science, National Oceanography Centre Southampton, University of Southampton Waterfront Campus , Southampton, UK
Marisa Sweeney: School of Ocean & Earth Science, National Oceanography Centre Southampton, University of Southampton Waterfront Campus , Southampton, UK
Anjali Goswami: Department of Life Sciences, Natural History Museum , London, UK
Thomas H. G. Ezard: School of Ocean & Earth Science, National Oceanography Centre Southampton, University of Southampton Waterfront Campus , Southampton, UK

DOI: https://doi.org/10.1098/rsos.240113
Journal volume & issue: Vol. 11, no. 6

Abstract

Read online

Deep learning has emerged as a robust tool for automating feature extraction from three-dimensional images, offering an efficient alternative to labour-intensive and potentially biased manual image segmentation methods. However, there has been limited exploration into the optimal training set sizes, including assessing whether artficial expansion by data augmentation can achieve consistent results in less time and how consistent these benefits are across different types of traits. In this study, we manually segmented 50 planktonic foraminifera specimens from the genus Menardella to determine the minimum number of training images required to produce accurate volumetric and shape data from internal and external structures. The results reveal unsurprisingly that deep learning models improve with a larger number of training images with eight specimens being required to achieve 95% accuracy. Furthermore, data augmentation can enhance network accuracy by up to 8.0%. Notably, predicting both volumetric and shape measurements for the internal structure poses a greater challenge compared with the external structure, owing to low contrast differences between different materials and increased geometric complexity. These results provide novel insight into optimal training set sizes for precise image segmentation of diverse traits and highlight the potential of data augmentation for enhancing multivariate feature extraction from three-dimensional images.

Published in Royal Society Open Science

ISSN: 2054-5703 (Online)
Publisher: The Royal Society
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://royalsocietypublishing.org/journal/rsos

About the journal

Abstract

Keywords