Application of Aligned-UMAP to longitudinal biomedical studies
Anant Dadu,
Vipul K. Satone,
Rachneet Kaur,
Mathew J. Koretsky,
Hirotaka Iwaki,
Yue A. Qi,
Daniel M. Ramos,
Brian Avants,
Jacob Hesterman,
Roger Gunn,
Mark R. Cookson,
Michael E. Ward,
Andrew B. Singleton,
Roy H. Campbell,
Mike A. Nalls,
Faraz Faghri
Affiliations
Anant Dadu
Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA; Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA; Data Tecnica International, Washington, DC 20037, USA
Vipul K. Satone
Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
Rachneet Kaur
Department of Industrial and Enterprise Systems Engineering, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
Mathew J. Koretsky
Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA; Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
Hirotaka Iwaki
Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA; Data Tecnica International, Washington, DC 20037, USA; Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
Yue A. Qi
Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
Daniel M. Ramos
Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
Brian Avants
Invicro, Image Analysis, Needham, MA, USA
Jacob Hesterman
Invicro, Image Analysis, Needham, MA, USA
Roger Gunn
Invicro, Image Analysis, Needham, MA, USA
Mark R. Cookson
Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA; Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
Michael E. Ward
Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA; National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
Andrew B. Singleton
Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA; Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
Roy H. Campbell
Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61820, USA
Mike A. Nalls
Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA; Data Tecnica International, Washington, DC 20037, USA; Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA
Faraz Faghri
Center for Alzheimer’s and Related Dementias (CARD), National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA; Data Tecnica International, Washington, DC 20037, USA; Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD 20892, USA; Corresponding author
Summary: High-dimensional data analysis starts with projecting the data to low dimensions to visualize and understand the underlying data structure. Several methods have been developed for dimensionality reduction, but they are limited to cross-sectional datasets. The recently proposed Aligned-UMAP, an extension of the uniform manifold approximation and projection (UMAP) algorithm, can visualize high-dimensional longitudinal datasets. We demonstrated its utility for researchers to identify exciting patterns and trajectories within enormous datasets in biological sciences. We found that the algorithm parameters also play a crucial role and must be tuned carefully to utilize the algorithm’s potential fully. We also discussed key points to remember and directions for future extensions of Aligned-UMAP. Further, we made our code open source to enhance the reproducibility and applicability of our work. We believe our benchmarking study becomes more important as more and more high-dimensional longitudinal data in biomedical research become available. The bigger picture: Longitudinal multi-dimensional biological datasets are ubiquitous and highly abundant. These datasets are essential to understanding disease progression, identifying subtypes, and discovering drugs. Discovering meaningful patterns or disease pathophysiologies in these datasets is challenging due to their high dimensionality, making it difficult to visualize hidden patterns. In this work, we applied Aligned-UMAP on a broad spectrum of clinical, imaging, proteomics, and single-cell datasets. Aligned-UMAP reveals time-dependent hidden patterns when color coded with the metadata. Altogether, based on its ease of use and our evaluation of its performance on different modalities, we anticipate that Aligned-UMAP will be a valuable tool for the biomedical community.