Resample aggregating improves the generalizability of connectome predictive modeling

David O'Connor; Evelyn M.R. Lake; Dustin Scheinost; R. Todd Constable

NeuroImage (Aug 2021)

Resample aggregating improves the generalizability of connectome predictive modeling

David O'Connor,
Evelyn M.R. Lake,
Dustin Scheinost,
R. Todd Constable

Affiliations

David O'Connor: Department of Biomedical Engineering, Yale University, United States; Corresponding author.
Evelyn M.R. Lake: Department of Radiology and Biomedical Imaging, Yale School of Medicine, United States
Dustin Scheinost: Department of Biomedical Engineering, Yale University, United States; Department of Radiology and Biomedical Imaging, Yale School of Medicine, United States; Deparment of Statistics & Data Science, Yale University, United States; Child Study Center, Yale School of Medicine, United States
R. Todd Constable: Department of Biomedical Engineering, Yale University, United States; Department of Radiology and Biomedical Imaging, Yale School of Medicine, United States; Department of Neurosurgery, Yale School of Medicine, United States

Journal volume & issue: Vol. 236
p. 118044

Abstract

Read online

It is a longstanding goal of neuroimaging to produce reliable, generalizable models of brain behavior relationships. More recently, data driven predictive models have become popular. However, overfitting is a common problem with statistical models, which impedes model generalization. Cross validation (CV) is often used to estimate expected model performance within sample. Yet, the best way to generate brain behavior models, and apply them out-of-sample, on an unseen dataset, is unclear. As a solution, this study proposes an ensemble learning method, in this case resample aggregating, encompassing both model parameter estimation and feature selection. Here we investigate the use of resampled aggregated models when used to estimate fluid intelligence (fIQ) from fMRI based functional connectivity (FC) data. We take advantage of two large openly available datasets, the Human Connectome Project (HCP), and the Philadelphia Neurodevelopmental Cohort (PNC). We generate aggregated and non-aggregated models of fIQ in the HCP, using the Connectome Prediction Modelling (CPM) framework. Over various test-train splits, these models are evaluated in sample, on left-out HCP data, and out-of-sample, on PNC data. We find that a resample aggregated model performs best both within- and out-of-sample. We also find that feature selection can vary substantially within-sample. More robust feature selection methods, as detailed here, are needed to improve cross sample performance of CPM based brain behavior models.

Published in NeuroImage

ISSN: 1053-8119 (Print); 1095-9572 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Internal medicine: Neurosciences. Biological psychiatry. Neuropsychiatry
Website: https://www.journals.elsevier.com/neuroimage

About the journal