Informatics in Medicine Unlocked (Jan 2018)

Computational analysis of next generation sequencing data and its applications in clinical oncology

  • Rucha M. Wadapurkar,
  • Renu Vyas

Journal volume & issue
Vol. 11
pp. 75 – 82

Abstract

Read online

Next generation sequencing (NGS) has made great strides in sequencing technology as it enables sequencing of genes in a high throughput manner with low cost. Various NGS platforms such as Illumina, Roche, ABI/SOLiD are used for wet-lab analysis of NGS data and computational tools such as BWA, Bowtie, Galaxy, SanGeniX are used for dry-lab analysis of NGS data. One of the important aspects of NGS data is its usage in early disease diagnosis especially in cancer which was earlier not possible with conventional sequencing technologies such as Sanger sequencing, NGS can identify all those mutations which cannot be identified using conventional sequencing technologies as researchers can now sequence the whole genome, exome or transcriptome. Exome sequencing is preferred, as a higher number of mutations are found to exist in the exome part of genes. The present comprehensive review encompasses the complete NGS data analysis workflow that includes alignment of NGS reads, identification and annotation of mutations and visualization, discussion of software tools for variant identification and annotation, evaluation of structural variation in NGS data, and study of different DNA sequencing technologies. In the field of clinical oncology, NGS has already proven its usefulness, and the mortality rate has been reduced, as now doctors can suggest a proper treatment to their patients by checking the complete genomic profile. However, data storage and the complexity in interpreting enormous amounts of data obtained with NGS still remain a computational challenge to researchers, as for each sample, the number of different and very large analysis files are generated directly from the raw sequence read file to the final result file. NGS resultant data is very complex, and its interpretation requires expert bioinformatics assistance, as a large number of mutations are identified from samples, but to differentiate clinically significant mutations among them with appropriate use of validation methods is a challenging task. This review is intended to provide researchers with a complete overview of NGS along with knowledge of how the tools will be employed, and insight into identification and interpretation of cancer mutations for clinical diagnostics. Keywords: Next generation sequencing, Mutations, Cancer, Sanger sequencing, Variant identification and annotation, Data analysis