Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits

Ganzhong Tian; John Hanfelt; James Lah; Benjamin B. Risk

doi:10.1080/26941899.2024.2309403

Data Science in Science (Dec 2024)

Mixture of Regressions with Multivariate Responses for Discovering Subtypes in Alzheimer’s Biomarkers with Detection Limits

Ganzhong Tian,
John Hanfelt,
James Lah,
Benjamin B. Risk

Affiliations

Ganzhong Tian: Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA
John Hanfelt: Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA
James Lah: Department of Neurology, Emory University School of Medicine, Atlanta, Georgia, USA
Benjamin B. Risk: Department of Biostatistics and Bioinformatics, Emory University, Atlanta, Georgia, USA

DOI: https://doi.org/10.1080/26941899.2024.2309403
Journal volume & issue: Vol. 3, no. 1

Abstract

Read online

There is no gold standard for the diagnosis of Alzheimer’s disease (AD), except for autopsies, which motivates the use of unsupervised learning. A mixture of regressions is an unsupervised method that can simultaneously identify clusters from multiple biomarkers while learning within-cluster demographic effects. Cerebrospinal fluid (CSF) biomarkers for AD have detection limits, which create additional challenges. We apply a mixture of regressions with a multivariate truncated Gaussian distribution (also called a censored multivariate Gaussian mixture of regressions or a mixture of multivariate Tobit regressions) to over 3000 participants from the Emory Goizueta Alzheimer’s Disease Research Center and Emory Healthy Brain Study to examine amyloid-beta peptide 1–42 (Abeta42), total tau protein and phosphorylated tau protein in CSF with known detection limits. We address three gaps in the literature on the mixture of regressions with a truncated multivariate Gaussian distribution: software availability; inference; and clustering accuracy. We discovered three clusters that tend to align with an AD group, a normal control profile, and non-AD pathology. The CSF profiles differed by race, gender, and the genetic marker ApoE4, highlighting the importance of considering demographic factors in unsupervised learning with detection limits. Notably, African American participants in the AD-like group had significantly lower tau burden.

Published in Data Science in Science

ISSN: 2694-1899 (Online)
Publisher: Taylor & Francis Group
Country of publisher: United Kingdom
LCC subjects: Science: Science (General); Science: Mathematics: Probabilities. Mathematical statistics
Website: https://www.tandfonline.com/UDSS

About the journal

Abstract

Keywords