Performance Test of a Well-Trained Model for Meningioma Segmentation in Health Care Centers: Secondary Analysis Based on Four Retrospective Multicenter Data Sets

Chaoyue Chen; Yuen Teng; Shuo Tan; Zizhou Wang; Lei Zhang; Jianguo Xu

doi:10.2196/44119

Journal of Medical Internet Research (Dec 2023)

Performance Test of a Well-Trained Model for Meningioma Segmentation in Health Care Centers: Secondary Analysis Based on Four Retrospective Multicenter Data Sets

Chaoyue Chen,
Yuen Teng,
Shuo Tan,
Zizhou Wang,
Lei Zhang,
Jianguo Xu

Affiliations

Chaoyue Chen: ORCiD
Yuen Teng: ORCiD
Shuo Tan: ORCiD
Zizhou Wang: ORCiD
Lei Zhang: ORCiD
Jianguo Xu: ORCiD

DOI: https://doi.org/10.2196/44119
Journal volume & issue: Vol. 25
p. e44119

Abstract

Read online

BackgroundConvolutional neural networks (CNNs) have produced state-of-the-art results in meningioma segmentation on magnetic resonance imaging (MRI). However, images obtained from different institutions, protocols, or scanners may show significant domain shift, leading to performance degradation and challenging model deployment in real clinical scenarios. ObjectiveThis research aims to investigate the realistic performance of a well-trained meningioma segmentation model when deployed across different health care centers and verify the methods to enhance its generalization. MethodsThis study was performed in four centers. A total of 606 patients with 606 MRIs were enrolled between January 2015 and December 2021. Manual segmentations, determined through consensus readings by neuroradiologists, were used as the ground truth mask. The model was previously trained using a standard supervised CNN called Deeplab V3+ and was deployed and tested separately in four health care centers. To determine the appropriate approach to mitigating the observed performance degradation, two methods were used: unsupervised domain adaptation and supervised retraining. ResultsThe trained model showed a state-of-the-art performance in tumor segmentation in two health care institutions, with a Dice ratio of 0.887 (SD 0.108, 95% CI 0.903-0.925) in center A and a Dice ratio of 0.874 (SD 0.800, 95% CI 0.854-0.894) in center B. Whereas in the other health care institutions, the performance declined, with Dice ratios of 0.631 (SD 0.157, 95% CI 0.556-0.707) in center C and 0.649 (SD 0.187, 95% CI 0.566-0.732) in center D, as they obtained the MRI using different scanning protocols. The unsupervised domain adaptation showed a significant improvement in performance scores, with Dice ratios of 0.842 (SD 0.073, 95% CI 0.820-0.864) in center C and 0.855 (SD 0.097, 95% CI 0.826-0.886) in center D. Nonetheless, it did not overperform the supervised retraining, which achieved Dice ratios of 0.899 (SD 0.026, 95% CI 0.889-0.906) in center C and 0.886 (SD 0.046, 95% CI 0.870-0.903) in center D. ConclusionsDeploying the trained CNN model in different health care institutions may show significant performance degradation due to the domain shift of MRIs. Under this circumstance, the use of unsupervised domain adaptation or supervised retraining should be considered, taking into account the balance between clinical requirements, model performance, and the size of the available data.

Published in Journal of Medical Internet Research

ISSN: 1438-8871 (Online)
Publisher: JMIR Publications
Country of publisher: Canada
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Medicine: Public aspects of medicine
Website: https://www.jmir.org

About the journal