npj Digital Medicine (Feb 2021)

Handling missing MRI sequences in deep learning segmentation of brain metastases: a multicenter study

  • Endre Grøvik,
  • Darvin Yi,
  • Michael Iv,
  • Elizabeth Tong,
  • Line Brennhaug Nilsen,
  • Anna Latysheva,
  • Cathrine Saxhaug,
  • Kari Dolven Jacobsen,
  • Åslaug Helland,
  • Kyrre Eeg Emblem,
  • Daniel L. Rubin,
  • Greg Zaharchuk

DOI
https://doi.org/10.1038/s41746-021-00398-4
Journal volume & issue
Vol. 4, no. 1
pp. 1 – 7

Abstract

Read online

Abstract The purpose of this study was to assess the clinical value of a deep learning (DL) model for automatic detection and segmentation of brain metastases, in which a neural network is trained on four distinct MRI sequences using an input-level dropout layer, thus simulating the scenario of missing MRI sequences by training on the full set and all possible subsets of the input data. This retrospective, multicenter study, evaluated 165 patients with brain metastases. The proposed input-level dropout (ILD) model was trained on multisequence MRI from 100 patients and validated/tested on 10/55 patients, in which the test set was missing one of the four MRI sequences used for training. The segmentation results were compared with the performance of a state-of-the-art DeepLab V3 model. The MR sequences in the training set included pre-gadolinium and post-gadolinium (Gd) T1-weighted 3D fast spin echo, post-Gd T1-weighted inversion recovery (IR) prepped fast spoiled gradient echo, and 3D fluid attenuated inversion recovery (FLAIR), whereas the test set did not include the IR prepped image-series. The ground truth segmentations were established by experienced neuroradiologists. The results were evaluated using precision, recall, Intersection over union (IoU)-score and Dice score, and receiver operating characteristics (ROC) curve statistics, while the Wilcoxon rank sum test was used to compare the performance of the two neural networks. The area under the ROC curve (AUC), averaged across all test cases, was 0.989 ± 0.029 for the ILD-model and 0.989 ± 0.023 for the DeepLab V3 model (p = 0.62). The ILD-model showed a significantly higher Dice score (0.795 ± 0.104 vs. 0.774 ± 0.104, p = 0.017), and IoU-score (0.561 ± 0.225 vs. 0.492 ± 0.186, p < 0.001) compared to the DeepLab V3 model, and a significantly lower average false positive rate of 3.6/patient vs. 7.0/patient (p < 0.001) using a 10 mm3 lesion-size limit. The ILD-model, trained on all possible combinations of four MRI sequences, may facilitate accurate detection and segmentation of brain metastases on a multicenter basis, even when the test cohort is missing input MRI sequences.