PLoS ONE (Jan 2014)

Trainable high resolution melt curve machine learning classifier for large-scale reliable genotyping of sequence variants.

  • Pornpat Athamanolap,
  • Vishwa Parekh,
  • Stephanie I Fraley,
  • Vatsal Agarwal,
  • Dong J Shin,
  • Michael A Jacobs,
  • Tza-Huei Wang,
  • Samuel Yang

DOI
https://doi.org/10.1371/journal.pone.0109094
Journal volume & issue
Vol. 9, no. 9
p. e109094

Abstract

Read online

High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new method for automated HRM curve classification based on machine learning methods and learned tolerance for reaction condition deviations. We tested this method in silico through multiple cross-validations using curves generated from 9 different simulated experimental conditions to classify 92 known serotypes of Streptococcus pneumoniae and demonstrated over 99% accuracy with 8 training curves per serotype. In vitro verification of the algorithm was tested using sequence variants of a cancer-related gene and demonstrated 100% accuracy with 3 training curves per sequence variant. The machine learning algorithm enabled reliable, scalable, and automated HRM genotyping analysis with broad potential clinical and epidemiological applications.