Removing the effects of the site in brain imaging machine-learning – Measurement and extendable benchmark
Aleix Solanes,
Corentin J Gosling,
Lydia Fortea,
María Ortuño,
Elisabet Lopez-Soley,
Sara Llufriu,
Santiago Madero,
Eloy Martinez-Heras,
Edith Pomarol-Clotet,
Elisabeth Solana,
Eduard Vieta,
Joaquim Radua
Affiliations
Aleix Solanes
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Department of Psychiatry and Forensic Medicine, Autonomous University of Barcelona, Barcelona, Spain
Corentin J Gosling
DysCo Lab, Paris Nanterre University, Nanterre, France; Laboratoire de Psychopathologie et Processus de Santé, Université de Paris, Paris, France
Lydia Fortea
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Biomedical Network Research Centre on Mental Health (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain; University of Barcelona, Barcelona, Spain
María Ortuño
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain
Elisabet Lopez-Soley
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; University of Barcelona, Barcelona, Spain; Center of Neuroimmunology, Laboratory of Advanced Imaging in Neuroimmunological Diseases, Hospital Clinic Barcelona
Sara Llufriu
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; University of Barcelona, Barcelona, Spain; Center of Neuroimmunology, Laboratory of Advanced Imaging in Neuroimmunological Diseases, Hospital Clinic Barcelona
Santiago Madero
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Biomedical Network Research Centre on Mental Health (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain; University of Barcelona, Barcelona, Spain; Barcelona Bipolar Disorders and Depressive Unit, Institute of Neurosciences, Hospital Clinic, Barcelona, Spain
Eloy Martinez-Heras
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; University of Barcelona, Barcelona, Spain; Center of Neuroimmunology, Laboratory of Advanced Imaging in Neuroimmunological Diseases, Hospital Clinic Barcelona
Edith Pomarol-Clotet
Biomedical Network Research Centre on Mental Health (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain; FIDMAG Germanes Hospitalàries Research Foundation, Barcelona, Spain; Benito Menni CASM, Sant Boi de Llobregat, Barcelona, Spain
Elisabeth Solana
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; University of Barcelona, Barcelona, Spain; Center of Neuroimmunology, Laboratory of Advanced Imaging in Neuroimmunological Diseases, Hospital Clinic Barcelona
Eduard Vieta
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Biomedical Network Research Centre on Mental Health (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain; University of Barcelona, Barcelona, Spain; Barcelona Bipolar Disorders and Depressive Unit, Institute of Neurosciences, Hospital Clinic, Barcelona, Spain
Joaquim Radua
Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), Barcelona, Spain; Biomedical Network Research Centre on Mental Health (CIBERSAM), Instituto de Salud Carlos III, Madrid, Spain; University of Barcelona, Barcelona, Spain; Department of Psychosis Studies, Institute of Psychiatry, Psychology, and Neuroscience, King's College London, London, United Kingdom; Centre for Psychiatric Research and Education, Department of Clinical Neuroscience, Karolinska Institutet, Stockholm, Sweden; Corresponding author at: IDIBAPS: Institut d'Investigacions Biomediques August Pi i Sunyer, Rosselló 149, 08036 Barcelona, Spain.
Multisite machine-learning neuroimaging studies, such as those conducted by the ENIGMA Consortium, need to remove the differences between sites to avoid effects of the site (EoS) that may prevent or fraudulently help the creation of prediction models, leading to impoverished or inflated prediction accuracy. Unfortunately, we have shown earlier that current Methods Aiming to Remove the EoS (MAREoS, e.g., ComBat) cannot remove complex EoS (e.g., including interactions between regions). And complex EoS may bias the accuracy. To overcome this hurdle, groups worldwide are developing novel MAREoS. However, we cannot assess their effectiveness because EoS may either inflate or shrink the accuracy, and MAREoS may both remove the EoS and degrade the data. In this work, we propose a strategy to measure the effectiveness of a MAREoS in removing different types of EoS. FOR MAREOS DEVELOPERS, we provide two multisite MRI datasets with only simple true effects (i.e., detectable by most machine-learning algorithms) and two with only simple EoS (i.e., removable by most MAREoS). First, they should use these datasets to fit machine-learning algorithms after applying the MAREoS. Second, they should use the formulas we provide to calculate the relative accuracy change associated with the MAREoS in each dataset and derive an EoS-removal effectiveness statistic. We also offer similar datasets and formulas for complex true effects and EoS that include first-order interactions. FOR MACHINE-LEARNING RESEARCHERS, we provide an extendable benchmark website to show: a) the types of EoS they should remove for each given machine-learning algorithm and b) the effectiveness of each MAREoS for removing each type of EoS. Relevantly, a MAREoS only able to remove the simple EoS may suffice for simple machine-learning algorithms, whereas more complex algorithms need a MAREoS that can remove more complex EoS. For instance, ComBat removes all simple EoS as needed for predictions based on simple lasso algorithms, but it leaves residual complex EoS that may bias the predictions based on standard support vector machine algorithms.