NeuroImage (Dec 2022)

Sample size requirement for achieving multisite harmonization using structural brain MRI features

  • Pravesh Parekh,
  • Gaurav Vivek Bhalerao,
  • John P. John,
  • G. Venkatasubramanian,
  • Biju Viswanath,
  • Naren P. Rao,
  • Janardhanan C. Narayanaswamy,
  • Palanimuthu T. Sivakumar,
  • Arun Kandasamy,
  • Muralidharan Kesavan,
  • Urvakhsh Meherwan Mehta,
  • Odity Mukherjee,
  • Meera Purushottam,
  • Bhupesh Mehta,
  • Thennarasu Kandavel,
  • B. Binukumar,
  • Jitender Saini,
  • Deepak Jayarajan,
  • A. Shyamsundar,
  • Sydney Moirangthem,
  • K.G. Vijay Kumar,
  • Jayant Mahadevan,
  • Bharath Holla,
  • Jagadisha Thirthalli,
  • Bangalore N. Gangadhar,
  • Pratima Murthy,
  • Mitradas M. Panicker,
  • Upinder S. Bhalla,
  • Sumantra Chattarji,
  • Vivek Benegal,
  • Mathew Varghese,
  • Janardhan Y.C. Reddy,
  • Padinjat Raghu,
  • Mahendra Rao,
  • Sanjeev Jain

Journal volume & issue
Vol. 264
p. 119768

Abstract

Read online

When data is pooled across multiple sites, the extracted features are confounded by site effects. Harmonization methods attempt to correct these site effects while preserving the biological variability within the features. However, little is known about the sample size requirement for effectively learning the harmonization parameters and their relationship with the increasing number of sites. In this study, we performed experiments to find the minimum sample size required to achieve multisite harmonization (using neuroHarmonize) using volumetric and surface features by leveraging the concept of learning curves. Our first two experiments show that site-effects are effectively removed in a univariate and multivariate manner; however, it is essential to regress the effect of covariates from the harmonized data additionally. Our following two experiments with actual and simulated data showed that the minimum sample size required for achieving harmonization grows with the increasing average Mahalanobis distances between the sites and their reference distribution. We conclude by positing a general framework to understand the site effects using the Mahalanobis distance. Further, we provide insights on the various factors in a cross-validation design to achieve optimal inter-site harmonization.

Keywords