Merged Affinity Network Association Clustering: Joint multi-omic/clinical clustering to identify disease endotypes
Scott R. Tyler,
Yoojin Chun,
Victoria M. Ribeiro,
Galina Grishina,
Alexander Grishin,
Gabriel E. Hoffman,
Anh N. Do,
Supinda Bunyavanich
Affiliations
Scott R. Tyler
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Yoojin Chun
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Victoria M. Ribeiro
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Galina Grishina
Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Alexander Grishin
Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Gabriel E. Hoffman
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Anh N. Do
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
Supinda Bunyavanich
Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Division of Allergy and Immunology, Department of Pediatrics, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Corresponding author
Summary: Although clinical and laboratory data have long been used to guide medical practice, this information is rarely integrated with multi-omic data to identify endotypes. We present Merged Affinity Network Association Clustering (MANAclust), a coding-free, automated pipeline enabling integration of categorical and numeric data spanning clinical and multi-omic profiles for unsupervised clustering to identify disease subsets. Using simulations and real-world data from The Cancer Genome Atlas, we demonstrate that MANAclust’s feature selection algorithms are accurate and outperform competitors. We also apply MANAclust to a clinically and multi-omically phenotyped asthma cohort. MANAclust identifies clinically and molecularly distinct clusters, including heterogeneous groups of “healthy controls” and viral and allergy-driven subsets of asthmatic subjects. We also find that subjects with similar clinical presentations have disparate molecular profiles, highlighting the need for additional testing to uncover asthma endotypes. This work facilitates data-driven personalized medicine through integration of clinical parameters with multi-omics. MANAclust is freely available at https://bitbucket.org/scottyler892/manaclust/src/master/.