NeuroImage (Apr 2023)
Cross–scanner harmonization methods for structural MRI may need further work: A comparison study
Abstract
The clinical usefulness MRI biomarkers for aging and dementia studies relies on precise brain morphological measurements; however, scanner and/or protocol variations may introduce noise or bias. One approach to address this is post-acquisition scan harmonization. In this work, we evaluate deep learning (neural style transfer, CycleGAN and CGAN), histogram matching, and statistical (ComBat and LongComBat) methods. Participants who had been scanned on both GE and Siemens scanners (cross-sectional participants, known as Crossover (n = 113), and longitudinally scanned participants on both scanners (n = 454)) were used. The goal was to match GE MPRAGE (T1-weighted) scans to Siemens improved resolution MPRAGE scans. Harmonization was performed on raw native and preprocessed (resampled, affine transformed to template space) scans. Cortical thicknesses were measured using FreeSurfer (v.7.1.1). Distributions were checked using Kolmogorov-Smirnov tests. Intra-class correlation (ICC) was used to assess the degree of agreement in the Crossover datasets and annualized percent change in cortical thickness was calculated to evaluate the Longitudinal datasets. Prior to harmonization, the least agreement was found at the frontal pole (ICC = 0.72) for the raw native scans, and at caudal anterior cingulate (0.76) and frontal pole (0.54) for the preprocessed scans. Harmonization with NST, CycleGAN, and HM improved the ICCs of the preprocessed scans at the caudal anterior cingulate (>0.81) and frontal poles (>0.67). In the Longitudinal raw native scans, over- and under-estimations of cortical thickness were observed due to the changing of the scanners. ComBat matched the cortical thickness distributions throughout but was not able to increase the ICCs or remove the effects of scanner changeover in the Longitudinal datasets. CycleGAN and NST performed slightly better to address the cortical thickness variations between scanner change. However, none of the methods succeeded in harmonizing the Longitudinal dataset. CGAN was the worst performer for both datasets. In conclusion, the performance of the methods was overall similar and region dependent. Future research is needed to improve the existing approaches since none of them outperformed each other in terms of harmonizing the datasets at all ROIs. The findings of this study establish framework for future research into the scan harmonization problem.