Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing

Riyue Bao; Lei Huang; Jorge Andrade; Wei Tan; Warren A. Kibbe; Hongmei Jiang; Gang Feng

doi:10.4137/CIN.S13779

Cancer Informatics (Jan 2014)

Review of Current Methods, Applications, and Data Management for the Bioinformatics Analysis of Whole Exome Sequencing

Riyue Bao,
Lei Huang,
Jorge Andrade,
Wei Tan,
Warren A. Kibbe,
Hongmei Jiang,
Gang Feng

Affiliations

Riyue Bao: Center for Research Informatics, The University of Chicago, Chicago, IL, USA.
Lei Huang: Center for Research Informatics, The University of Chicago, Chicago, IL, USA.
Jorge Andrade: Center for Research Informatics, The University of Chicago, Chicago, IL, USA.
Wei Tan: IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA.
Warren A. Kibbe: Biomedical Informatics Center (NUBIC), Clinical and Translational Sciences Institute (NUCATS), Northwestern University, Chicago, IL, USA.
Hongmei Jiang: Department of Statistics, Northwestern University, Evanston, IL, USA.
Gang Feng: Biomedical Informatics Center (NUBIC), Clinical and Translational Sciences Institute (NUCATS), Northwestern University, Chicago, IL, USA.

DOI: https://doi.org/10.4137/CIN.S13779
Journal volume & issue: Vol. 13s2

Abstract

Read online

The advent of next-generation sequencing technologies has greatly promoted advances in the study of human diseases at the genomic, transcriptomic, and epigenetic levels. Exome sequencing, where the coding region of the genome is captured and sequenced at a deep level, has proven to be a cost-effective method to detect disease-causing variants and discover gene targets. In this review, we outline the general framework of whole exome sequence data analysis. We focus on established bioinformatics tools and applications that support five analytical steps: raw data quality assessment, preprocessing, alignment, post-processing, and variant analysis (detection, annotation, and prioritization). We evaluate the performance of open-source alignment programs and variant calling tools using simulated and benchmark datasets, and highlight the challenges posed by the lack of concordance among variant detection tools. Based on these results, we recommend adopting multiple tools and resources to reduce false positives and increase the sensitivity of variant calling. In addition, we briefly discuss the current status and solutions for big data management, analysis, and summarization in the field of bioinformatics.

Published in Cancer Informatics

ISSN: 1176-9351 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Website: https://journals.sagepub.com/home/cix

About the journal