Privacy-preserving harmonization via distributed ComBat
Andrew A. Chen,
Chongliang Luo,
Yong Chen,
Russell T. Shinohara,
Haochang Shou
Affiliations
Andrew A. Chen
Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States; Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104, United States; Corresponding author at: Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States.
Chongliang Luo
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
Yong Chen
Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States
Russell T. Shinohara
Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States; Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104, United States
Haochang Shou
Penn Statistics in Imaging and Visualization Center, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Philadelphia, PA 19104, United States; Center for Biomedical Image Computing and Analytics, University of Pennsylvania, Philadelphia, PA 19104, United States
Challenges in clinical data sharing and the need to protect data privacy have led to the development and popularization of methods that do not require directly transferring patient data. In neuroimaging, integration of data across multiple institutions also introduces unwanted biases driven by scanner differences. These scanner effects have been shown by several research groups to severely affect downstream analyses. To facilitate the need of removing scanner effects in a distributed data setting, we introduce distributed ComBat, an adaptation of a popular harmonization method for multivariate data that borrows information across features. We present our fast and simple distributed algorithm and show that it yields equivalent results using data from the Alzheimer’s Disease Neuroimaging Initiative. Our method enables harmonization while ensuring maximal privacy protection, thus facilitating a broad range of downstream analyses in functional and structural imaging studies.