Biochemistry and Biophysics Reports (Jul 2024)

Integration of machine learning models with microsatellite markers: New avenue in world grapevine germplasm characterization

  • Hossein Abbasi Holasou,
  • Bahman Panahi,
  • Ali Shahi,
  • Yousef Nami

Journal volume & issue
Vol. 38
p. 101678

Abstract

Read online

Development of efficient analytical techniques is required for effective interpretation of biological data to take novel hypotheses and finding the critical predictive patterns. Machine Learning algorithms provide a novel opportunity for development of low-cost and practical solutions in biology. In this study, we proposed a new integrated analytical approach using supervised machine learning algorithms and microsatellites data of worldwide vitis populations. A total of 1378 wild (V. vinifera spp. sylvestris) and cultivated (V. vinifera spp. sativa) accessions of grapevine were investigated using 20 microsatellite markers. Data cleaning, feature selection, and supervised machine learning classification models vis, Naive Bayes, Support Vector Machine (SVM) and Tree Induction methods were implied to find most indicative and diagnostic alleles to represent wild/cultivated and originated geography of each population. Our combined approaches showed microsatellite markers with the highest differentiating capacity and proved efficiency for our pipeline of classification and prediction of vitis accessions. Moreover, our study proposed the best combination of markers for better distinguishing of populations, which can be exploited in future germplasm conservation and breeding programs.

Keywords