Deep learning-based approaches for multi-omics data integration and analysis

Jenna L. Ballard; Zexuan Wang; Wenrui Li; Li Shen; Qi Long

doi:10.1186/s13040-024-00391-z

BioData Mining (Oct 2024)

Deep learning-based approaches for multi-omics data integration and analysis

Jenna L. Ballard,
Zexuan Wang,
Wenrui Li,
Li Shen,
Qi Long

Affiliations

Jenna L. Ballard: Graduate Group in Genomics and Computational Biology, Perelman School of Medicine, University of Pennsylvania
Zexuan Wang: Graduate Group in Applied Mathematics and Computational Science, University of Pennsylvania
Wenrui Li: Department of Statistics, University of Connecticut
Li Shen: Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania
Qi Long: Department of Biostatistics, Epidemiology and Informatics, Perelman School of Medicine, University of Pennsylvania

DOI: https://doi.org/10.1186/s13040-024-00391-z
Journal volume & issue: Vol. 17, no. 1
pp. 1 – 29

Abstract

Read online

Abstract Background The rapid growth of deep learning, as well as the vast and ever-growing amount of available data, have provided ample opportunity for advances in fusion and analysis of complex and heterogeneous data types. Different data modalities provide complementary information that can be leveraged to gain a more complete understanding of each subject. In the biomedical domain, multi-omics data includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics, etc.) and imaging (radiomics, pathomics) modalities which, when combined, have the potential to improve performance on prediction, classification, clustering and other tasks. Deep learning encompasses a wide variety of methods, each of which have certain strengths and weaknesses for multi-omics integration. Method In this review, we categorize recent deep learning-based approaches by their basic architectures and discuss their unique capabilities in relation to one another. We also discuss some emerging themes advancing the field of multi-omics integration. Results Deep learning-based multi-omics integration methods were categorized broadly into non-generative (feedforward neural networks, graph convolutional neural networks, and autoencoders) and generative (variational methods, generative adversarial models, and a generative pretrained model). Generative methods have the advantage of being able to impose constraints on the shared representations to enforce certain properties or incorporate prior knowledge. They can also be used to generate or impute missing modalities. Recent advances achieved by these methods include the ability to handle incomplete data as well as going beyond the traditional molecular omics data types to integrate other modalities such as imaging data. Conclusion We expect to see further growth in methods that can handle missingness, as this is a common challenge in working with complex and heterogeneous data. Additionally, methods that integrate more data types are expected to improve performance on downstream tasks by capturing a comprehensive view of each sample.

Published in BioData Mining

ISSN: 1756-0381 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Mathematics: Analysis
Website: https://biodatamining.biomedcentral.com/

About the journal

Abstract

Keywords