Evolutionary Bioinformatics (Mar 2019)

Vector Quantized Spectral Clustering Applied to Whole Genome Sequences of Plants

  • Aditya A Shastri,
  • Kapil Ahuja,
  • Milind B Ratnaparkhe,
  • Aditya Shah,
  • Aishwary Gagrani,
  • Anant Lal

DOI
https://doi.org/10.1177/1176934319836997
Journal volume & issue
Vol. 15

Abstract

Read online

We develop a Vector Quantized Spectral Clustering (VQSC) algorithm that is a combination of spectral clustering (SC) and vector quantization (VQ) sampling for grouping genome sequences of plants. The inspiration here is to use SC for its accuracy and VQ to make the algorithm computationally cheap (the complexity of SC is cubic in terms of the input size). Although the combination of SC and VQ is not new, the novelty of our work is in developing the crucial similarity matrix in SC as well as use of k -medoids in VQ, both adapted for the plant genome data. For Soybean, we compare our approach with commonly used techniques like Un-weighted Pair Graph Method with Arithmetic mean (UPGMA) and Neighbor Joining (NJ). Experimental results show that our VQSC outperforms both these techniques significantly in terms of cluster quality (average improvement of 21% over UPGMA and 24% over NJ) as well as time complexity (order of magnitude faster than both UPGMA and NJ).