Journal of Micropalaeontology (Nov 2022)

Artificial intelligence applied to the classification of eight middle Eocene species of the genus <i>Podocyrtis</i> (polycystine radiolaria)

  • V. Carlsson,
  • V. Carlsson,
  • T. Danelian,
  • P. Boulet,
  • P. Devienne,
  • A. Laforge,
  • A. Laforge,
  • J. Renaudie

DOI
https://doi.org/10.5194/jm-41-165-2022
Journal volume & issue
Vol. 41
pp. 165 – 182

Abstract

Read online

This study evaluates the application of artificial intelligence (AI) to the automatic classification of radiolarians and uses as an example eight distinct morphospecies of the Eocene radiolarian genus Podocyrtis, which are part of three different evolutionary lineages and are useful in biostratigraphy. The samples used in this study were recovered from the equatorial Atlantic (ODP Leg 207) and were supplemented with some samples coming from the North Atlantic and Indian Oceans. To create an automatic classification tool, numerous images of the investigated species were needed to train a MobileNet convolutional neural network entirely coded in Python. Three different datasets were obtained. The first one consists of a mixture of broken and complete specimens, some of which sometimes appear blurry. The second and third datasets were leveled down into two further steps, which excludes broken and blurry specimens while increasing the quality. The convolutional neural network randomly selected 85 % of all specimens for training, while the remaining 15 % were used for validation. The MobileNet architecture had an overall accuracy of about 91 % for all datasets. Three predicational models were thereafter created, which had been trained on each dataset and worked well for classification of Podocyrtis coming from the Indian Ocean (Madingley Rise, ODP Leg 115, Hole 711A) and the western North Atlantic Ocean (New Jersey slope, DSDP Leg 95, Hole 612 and Blake Nose, ODP Leg 171B, Hole 1051A). These samples also provided clearer images since they were mounted with Canada balsam rather than Norland epoxy. In spite of some morphological differences encountered in different parts of the world's oceans and differences in image quality, most species could be correctly classified or at least classified with a neighboring species along a lineage. Classification improved slightly for some species by cropping and/or removing background particles of images which did not segment properly in the image processing. However, depending on cropping or background removal, the best result came from the predictive model trained on the normal stacked dataset consisting of a mixture of broken and complete specimens.