NeuroImage: Clinical (Jan 2021)

Development and evaluation of a manual segmentation protocol for deep grey matter in multiple sclerosis: Towards accelerated semi-automated references

  • Alexandra de Sitter,
  • Jessica Burggraaff,
  • Fabian Bartel,
  • Miklos Palotai,
  • Yaou Liu,
  • Jorge Simoes,
  • Serena Ruggieri,
  • Katharina Schregel,
  • Stefan Ropele,
  • Maria A. Rocca,
  • Claudio Gasperini,
  • Antonio Gallo,
  • Menno M. Schoonheim,
  • Michael Amann,
  • Marios Yiannakas,
  • Deborah Pareto,
  • Mike P. Wattjes,
  • Jaume Sastre-Garriga,
  • Ludwig Kappos,
  • Massimo Filippi,
  • Christian Enzinger,
  • Jette Frederiksen,
  • Bernard Uitdehaag,
  • Charles R.G. Guttmann,
  • Frederik Barkhof,
  • Hugo Vrenken

Journal volume & issue
Vol. 30
p. 102659

Abstract

Read online

Background: Deep grey matter (dGM) structures, particularly the thalamus, are clinically relevant in multiple sclerosis (MS). However, segmentation of dGM in MS is challenging; labeled MS-specific reference sets are needed for objective evaluation and training of new methods. Objectives: This study aimed to (i) create a standardized protocol for manual delineations of dGM; (ii) evaluate the reliability of the protocol with multiple raters; and (iii) evaluate the accuracy of a fast-semi-automated segmentation approach (FASTSURF). Methods: A standardized manual segmentation protocol for caudate nucleus, putamen, and thalamus was created, and applied by three raters on multi-center 3D T1-weighted MRI scans of 23 MS patients and 12 controls. Intra- and inter-rater agreement was assessed through intra-class correlation coefficient (ICC); spatial overlap through Jaccard Index (JI) and generalized conformity index (CIgen). From sparse delineations, FASTSURF reconstructed full segmentations; accuracy was assessed both volumetrically and spatially. Results: All structures showed excellent agreement on expert manual outlines: intra-rater JI > 0.83; inter-rater ICC ≥ 0.76 and CIgen ≥ 0.74. FASTSURF reproduced manual references excellently, with ICC ≥ 0.97 and JI ≥ 0.92. Conclusions: The manual dGM segmentation protocol showed excellent reproducibility within and between raters. Moreover, combined with FASTSURF a reliable reference set of dGM segmentations can be produced with lower workload.

Keywords