NeuroImage (Apr 2020)

Test-retest reliability of FreeSurfer automated hippocampal subfield segmentation within and across scanners

  • Emma M. Brown,
  • Meghan E. Pierce,
  • Dustin C. Clark,
  • Bruce R. Fischl,
  • Juan E. Iglesias,
  • William P. Milberg,
  • Regina E. McGlinchey,
  • David H. Salat

Journal volume & issue
Vol. 210
p. 116563

Abstract

Read online

The human hippocampus is vulnerable to a range of degenerative conditions and as such, accurate in vivo measurement of the hippocampus and hippocampal substructures via neuroimaging is of great interest for understanding mechanisms of disease as well as for use as a biomarker in clinical trials of novel therapeutics. Although total hippocampal volume can be measured relatively reliably, it is critical to understand how this reliability is affected by acquisition on different scanners, as multiple scanning platforms would likely be utilized in large-scale clinical trials. This is particularly true for hippocampal subregional measurements, which have only relatively recently been measurable through common image processing platforms such as FreeSurfer. Accurate segmentation of these subregions is challenging due to their small size, magnetic resonance imaging (MRI) signal loss in medial temporal regions of the brain, and lack of contrast for delineation from standard neuroimaging procedures.Here, we assess the test-retest reliability of the FreeSurfer automated hippocampal subfield segmentation procedure using two Siemens model scanners (a Siemens Trio and Prismafit Trio upgrade). T1-weighted images were acquired for 11 generally healthy younger participants (two scans on the Trio and one scan on the Prismafit). Each scan was processed through the standard cross-sectional stream and the recently released longitudinal pipeline in FreeSurfer v6.0 for hippocampal segmentation. Test-retest reliability of the volumetric measures was examined for individual subfields as well as percent volume difference and Dice overlap among scans and intra-class correlation coefficients (ICC). Reliability was high in the molecular layer, dentate gyrus, and whole hippocampus with the inclusion of three time points with mean volume differences among scans less than 3%, overlap greater than 80%, and ICC >0.95. The parasubiculum and hippocampal fissure showed the least improvement in reliability with mean volume difference greater than 5%, overlap less than 70%, and ICC scores ranging from 0.78 to 0.89. Other subregions, including the CA regions, were stable in their mean volume difference and overlap (75% respectively) and showed improvement in reliability with the inclusion of three scans (ICC ​> ​0.9). Reliability was generally higher within scanner (Trio-Trio), however, Trio-Prismafit reliability was also high and did not exhibit an obvious bias. These results suggest that the FreeSurfer automated segmentation procedure is a reliable method to measure total as well as hippocampal subregional volumes and may be useful in clinical applications including as an endpoint for future clinical trials of conditions affecting the hippocampus.

Keywords