GCAC: galaxy workflow system for predictive model building for virtual screening

Deepak R. Bharti; Anmol J. Hemrom; Andrew M. Lynn

doi:10.1186/s12859-018-2492-8

BMC Bioinformatics (Feb 2019)

GCAC: galaxy workflow system for predictive model building for virtual screening

Deepak R. Bharti,
Anmol J. Hemrom,
Andrew M. Lynn

Affiliations

Deepak R. Bharti: School of Computational and Integrative Sciences, Jawaharlal Nehru University
Anmol J. Hemrom: School of Computational and Integrative Sciences, Jawaharlal Nehru University
Andrew M. Lynn: School of Computational and Integrative Sciences, Jawaharlal Nehru University

DOI: https://doi.org/10.1186/s12859-018-2492-8
Journal volume & issue: Vol. 19, no. S13
pp. 199 – 206

Abstract

Read online

Abstract Background Traditional drug discovery approaches are time-consuming, tedious and expensive. Identifying a potential drug-like molecule using high throughput screening (HTS) with high confidence is always a challenging task in drug discovery and cheminformatics. A small percentage of molecules that pass the clinical trial phases receives FDA approval. This whole process takes 10–12 years and millions of dollar of investment. The inconsistency in HTS is also a challenge for reproducible results. Reproducible research in computational research is highly desirable as a measure to evaluate scientific claims and published findings. This paper describes the development and availability of a knowledge based predictive model building system using the R Statistical Computing Environment and its ensured reproducibility using Galaxy workflow system. Results We describe a web-enabled data mining analysis pipeline which employs reproducible research approaches to confront the issue of availability of tools in high throughput virtual screening. The pipeline, named as “Galaxy for Compound Activity Classification (GCAC)” includes descriptor calculation, feature selection, model building, and screening to extract potent candidates, by leveraging the combined capabilities of R statistical packages and literate programming tools contained within a workflow system environment with automated configuration. Conclusion GCAC can serve as a standard for screening drug candidates using predictive model building under galaxy environment, allowing for easy installation and reproducibility. A demo site of the tool is available at http://ccbb.jnu.ac.in/gcac

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords