Multi-modality machine learning predicting Parkinson’s disease

Mary B. Makarious; Hampton L. Leonard; Dan Vitale; Hirotaka Iwaki; Lana Sargent; Anant Dadu; Ivo Violich; Elizabeth Hutchins; David Saffo; Sara Bandres-Ciga; Jonggeol Jeff Kim; Yeajin Song; Melina Maleknia; Matt Bookman; Willy Nojopranoto; Roy H. Campbell; Sayed Hadi Hashemi; Juan A. Botia; John F. Carter; David W. Craig; Kendall Van Keuren-Jensen; Huw R. Morris; John A. Hardy; Cornelis Blauwendraat; Andrew B. Singleton; Faraz Faghri; Mike A. Nalls

doi:10.1038/s41531-022-00288-w

npj Parkinson's Disease (Apr 2022)

Multi-modality machine learning predicting Parkinson’s disease

Mary B. Makarious,
Hampton L. Leonard,
Dan Vitale,
Hirotaka Iwaki,
Lana Sargent,
Anant Dadu,
Ivo Violich,
Elizabeth Hutchins,
David Saffo,
Sara Bandres-Ciga,
Jonggeol Jeff Kim,
Yeajin Song,
Melina Maleknia,
Matt Bookman,
Willy Nojopranoto,
Roy H. Campbell,
Sayed Hadi Hashemi,
Juan A. Botia,
John F. Carter,
David W. Craig,
Kendall Van Keuren-Jensen,
Huw R. Morris,
John A. Hardy,
Cornelis Blauwendraat,
Andrew B. Singleton,
Faraz Faghri,
Mike A. Nalls

Affiliations

Mary B. Makarious: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Hampton L. Leonard: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Dan Vitale: Center for Alzheimer’s and Related Dementias, National Institutes of Health
Hirotaka Iwaki: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Lana Sargent: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Anant Dadu: Department of Computer Science, University of Illinois at Urbana-Champaign
Ivo Violich: Institute of Translational Genomics, University of Southern California
Elizabeth Hutchins: Neurogenomics Division, Translational Genomics Research Institute (TGen)
David Saffo: Khoury College of Computer Sciences, Northeastern University
Sara Bandres-Ciga: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Jonggeol Jeff Kim: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Yeajin Song: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Melina Maleknia: Georgia Institute of Technology
Matt Bookman: Verily Life Sciences
Willy Nojopranoto: Verily Life Sciences
Roy H. Campbell: Department of Computer Science, University of Illinois at Urbana-Champaign
Sayed Hadi Hashemi: Department of Computer Science, University of Illinois at Urbana-Champaign
Juan A. Botia: Department of Molecular Neuroscience, UCL Queen Square Institute of Neurology
John F. Carter: ModelOp
David W. Craig: Institute of Translational Genomics, University of Southern California
Kendall Van Keuren-Jensen: Neurogenomics Division, Translational Genomics Research Institute (TGen)
Huw R. Morris: Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology
John A. Hardy: Department of Clinical and Movement Neurosciences, UCL Queen Square Institute of Neurology
Cornelis Blauwendraat: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Andrew B. Singleton: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Faraz Faghri: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health
Mike A. Nalls: Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health

DOI: https://doi.org/10.1038/s41531-022-00288-w
Journal volume & issue: Vol. 8, no. 1
pp. 1 – 13

Abstract

Read online

Abstract Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson’s disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort. We investigated top features, constructed hypothesis-free disease-relevant networks, and investigated drug–gene interactions. We performed automated ML on multimodal data from the Parkinson’s progression marker initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson’s Disease Biomarker Program (PDBP) dataset. Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification increased the diagnosis prediction accuracy and other metrics. Finally, networks were built to identify gene communities specific to PD. Combining data modalities outperforms the single biomarker paradigm. UPSIT and PRS contributed most to the predictive power of the model, but the accuracy of these are supplemented by many smaller effect transcripts and risk SNPs. Our model is best suited to identifying large groups of individuals to monitor within a health registry or biobank to prioritize for further testing. This approach allows complex predictive models to be reproducible and accessible to the community, with the package, code, and results publicly available.

Published in npj Parkinson's Disease

ISSN: 2373-8057 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry: Neurology. Diseases of the nervous system
Website: https://www.nature.com/npjparkd/

About the journal