BMC Genomics (Feb 2021)

Authoritative subspecies diagnosis tool for European honey bees based on ancestry informative SNPs

  • Jamal Momeni,
  • Melanie Parejo,
  • Rasmus O. Nielsen,
  • Jorge Langa,
  • Iratxe Montes,
  • Laetitia Papoutsis,
  • Leila Farajzadeh,
  • Christian Bendixen,
  • Eliza Căuia,
  • Jean-Daniel Charrière,
  • Mary F. Coffey,
  • Cecilia Costa,
  • Raffaele Dall’Olio,
  • Pilar De la Rúa,
  • M. Maja Drazic,
  • Janja Filipi,
  • Thomas Galea,
  • Miroljub Golubovski,
  • Ales Gregorc,
  • Karina Grigoryan,
  • Fani Hatjina,
  • Rustem Ilyasov,
  • Evgeniya Ivanova,
  • Irakli Janashia,
  • Irfan Kandemir,
  • Aikaterini Karatasou,
  • Meral Kekecoglu,
  • Nikola Kezic,
  • Enikö Sz. Matray,
  • David Mifsud,
  • Rudolf Moosbeckhofer,
  • Alexei G. Nikolenko,
  • Alexandros Papachristoforou,
  • Plamen Petrov,
  • M. Alice Pinto,
  • Aleksandr V. Poskryakov,
  • Aglyam Y. Sharipov,
  • Adrian Siceanu,
  • M. Ihsan Soysal,
  • Aleksandar Uzunov,
  • Marion Zammit-Mangion,
  • Rikke Vingborg,
  • Maria Bouga,
  • Per Kryger,
  • Marina D. Meixner,
  • Andone Estonba

DOI
https://doi.org/10.1186/s12864-021-07379-7
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background With numerous endemic subspecies representing four of its five evolutionary lineages, Europe holds a large fraction of Apis mellifera genetic diversity. This diversity and the natural distribution range have been altered by anthropogenic factors. The conservation of this natural heritage relies on the availability of accurate tools for subspecies diagnosis. Based on pool-sequence data from 2145 worker bees representing 22 populations sampled across Europe, we employed two highly discriminative approaches (PCA and FST) to select the most informative SNPs for ancestry inference. Results Using a supervised machine learning (ML) approach and a set of 3896 genotyped individuals, we could show that the 4094 selected single nucleotide polymorphisms (SNPs) provide an accurate prediction of ancestry inference in European honey bees. The best ML model was Linear Support Vector Classifier (Linear SVC) which correctly assigned most individuals to one of the 14 subspecies or different genetic origins with a mean accuracy of 96.2% ± 0.8 SD. A total of 3.8% of test individuals were misclassified, most probably due to limited differentiation between the subspecies caused by close geographical proximity, or human interference of genetic integrity of reference subspecies, or a combination thereof. Conclusions The diagnostic tool presented here will contribute to a sustainable conservation and support breeding activities in order to preserve the genetic heritage of European honey bees.

Keywords