PLoS ONE (Jan 2024)

Pollen identification through convolutional neural networks: First application on a full fossil pollen sequence.

  • Médéric Durand,
  • Jordan Paillard,
  • Marie-Pier Ménard,
  • Thomas Suranyi,
  • Pierre Grondin,
  • Olivier Blarquez

DOI
https://doi.org/10.1371/journal.pone.0302424
Journal volume & issue
Vol. 19, no. 4
p. e0302424

Abstract

Read online

The automation of pollen identification has seen vast improvements in the past years, with Convolutional Neural Networks coming out as the preferred tool to train models. Still, only a small portion of works published on the matter address the identification of fossil pollen. Fossil pollen is commonly extracted from organic sediment cores and are used by paleoecologists to reconstruct past environments, flora, vegetation, and their evolution through time. The automation of fossil pollen identification would allow paleoecologists to save both time and money while reducing bias and uncertainty. However, Convolutional Neural Networks require a large amount of data for training and databases of fossilized pollen are rare and often incomplete. Since machine learning models are usually trained using labelled fresh pollen associated with many different species, there exists a gap between the training data and target data. We propose a method for a large-scale fossil pollen identification workflow. Our proposed method employs an accelerated fossil pollen extraction protocol and Convolutional Neural Networks trained on the labelled fresh pollen of the species most commonly found in Northeastern American organic sediments. We first test our model on fresh pollen and then on a full fossil pollen sequence totalling 196,526 images. Our model achieved an average per class accuracy of 91.2% when tested against fresh pollen. However, we find that our model does not perform as well when tested on fossil data. While our model is overconfident in its predictions, the general abundance patterns remain consistent with the traditional palynologist IDs. Although not yet capable of accurately classifying a whole fossil pollen sequence, our model serves as a proof of concept towards creating a full large-scale identification workflow.