Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)

Rasmus Krempel; Pranav Kulkarni; Annie Yim; Ulrich Lang; Bianca Habermann; Peter Frommolt

doi:10.1186/s12859-018-2157-7

BMC Bioinformatics (Apr 2018)

Integrative analysis and machine learning on cancer genomics data using the Cancer Systems Biology Database (CancerSysDB)

Rasmus Krempel,
Pranav Kulkarni,
Annie Yim,
Ulrich Lang,
Bianca Habermann,
Peter Frommolt

Affiliations

Rasmus Krempel: Regional Computing Center of the University of Cologne (RRZK)
Pranav Kulkarni: Bioinformatics Facility, CECAD Research Center, University of Cologne
Annie Yim: Institut de Biologie du Développement, Aix-Marseille University
Ulrich Lang: Regional Computing Center of the University of Cologne (RRZK)
Bianca Habermann: Institut de Biologie du Développement, Aix-Marseille University
Peter Frommolt: Bioinformatics Facility, CECAD Research Center, University of Cologne

DOI: https://doi.org/10.1186/s12859-018-2157-7
Journal volume & issue: Vol. 19, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Background Recent cancer genome studies on many human cancer types have relied on multiple molecular high-throughput technologies. Given the vast amount of data that has been generated, there are surprisingly few databases which facilitate access to these data and make them available for flexible analysis queries in the broad research community. If used in their entirety and provided at a high structural level, these data can be directed into constantly increasing databases which bear an enormous potential to serve as a basis for machine learning technologies with the goal to support research and healthcare with predictions of clinically relevant traits. Results We have developed the Cancer Systems Biology Database (CancerSysDB), a resource for highly flexible queries and analysis of cancer-related data across multiple data types and multiple studies. The CancerSysDB can be adopted by any center for the organization of their locally acquired data and its integration with publicly available data from multiple studies. A publicly available main instance of the CancerSysDB can be used to obtain highly flexible queries across multiple data types as shown by highly relevant use cases. In addition, we demonstrate how the CancerSysDB can be used for predictive cancer classification based on whole-exome data from 9091 patients in The Cancer Genome Atlas (TCGA) research network. Conclusions Our database bears the potential to be used for large-scale integrative queries and predictive analytics of clinically relevant traits.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal