Genomics Data (Mar 2016)

Next generation sequencing (NGS) database for tandem repeats with multiple pattern 2°-shaft multicore string matching

  • Chinta Someswara Rao,
  • S. Viswanadha Raju

Journal volume & issue
Vol. 7
pp. 307 – 317

Abstract

Read online

Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research in recent years. To provide the comprehensive NGS resource for the research, in this paper , we have considered 10 loci/codi/repeats TAGA, TCAT, GAAT, AGAT, AGAA, GATA, TATC, CTTT, TCTG and TCTA. Then we developed the NGS Tandem Repeat Database (TandemRepeatDB) for all the chromosomes of Homo sapiens, Callithrix jacchus, Chlorocebus sabaeus, Gorilla gorilla, Macaca fascicularis, Macaca mulatta, Nomascus leucogenys, Pan troglodytes, Papio anubis and Pongo abelii genome data sets for all those locis. We find the successive occurence frequency for all the above 10 SSR (simple sequence repeats) in the above genome data sets on a chromosome-by-chromosome basis with multiple pattern 2° shaft multicore string matching. Keywords: NGS, SSR, TandemRepeatDB, Genome, String matching, chromosomes