Overcoming cohort heterogeneity for the prediction of subclinical cardiovascular disease risk
Adam S. Chan,
Songhua Wu,
Stephen T. Vernon,
Owen Tang,
Gemma A. Figtree,
Tongliang Liu,
Jean Y.H. Yang,
Ellis Patrick
Affiliations
Adam S. Chan
School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia; Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia; Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia
Songhua Wu
School of Computer Science, The University of Sydney, Sydney, NSW, Australia
Stephen T. Vernon
Kolling Institute of Medical Research, Royal North Shore Hospital, Sydney, NSW, Australia
Owen Tang
Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia; Kolling Institute of Medical Research, Royal North Shore Hospital, Sydney, NSW, Australia
Gemma A. Figtree
Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia; Kolling Institute of Medical Research, Royal North Shore Hospital, Sydney, NSW, Australia
Tongliang Liu
Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia; School of Computer Science, The University of Sydney, Sydney, NSW, Australia
Jean Y.H. Yang
School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia; Charles Perkins Centre, The University of Sydney, Sydney, NSW, Australia; Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia; Corresponding author
Ellis Patrick
School of Mathematics and Statistics, The University of Sydney, Sydney, NSW, Australia; Sydney Precision Data Science Centre, The University of Sydney, Sydney, NSW, Australia; Westmead Medical Institute, Sydney, NSW, Australia; Corresponding author
Summary: Cardiovascular disease remains a leading cause of mortality with an estimated half a billion people affected in 2019. However, detecting signals between specific pathophysiology and coronary plaque phenotypes using complex multi-omic discovery datasets remains challenging due to the diversity of individuals and their risk factors. Given the complex cohort heterogeneity present in those with coronary artery disease (CAD), we illustrate several different methods, both knowledge-guided and data-driven approaches, for identifying subcohorts of individuals with subclinical CAD and distinct metabolomic signatures. We then demonstrate that utilizing these subcohorts can improve the prediction of subclinical CAD and can facilitate the discovery of novel biomarkers of subclinical disease. Analyses acknowledging cohort heterogeneity through identifying and utilizing these subcohorts may be able to advance our understanding of CVD and provide more effective preventative treatments to reduce the burden of this disease in individuals and in society as a whole.