Frontiers in Medicine (Sep 2023)

Magnetic resonance imaging based deep-learning model: a rapid, high-performance, automated tool for testicular volume measurements

  • Kailun Sun,
  • Chanyuan Fan,
  • Zhaoyan Feng,
  • Xiangde Min,
  • Yu Wang,
  • Ziyan Sun,
  • Yan Li,
  • Wei Cai,
  • Xi Yin,
  • Peipei Zhang,
  • Qiuyu Liu,
  • Liming Xia

DOI
https://doi.org/10.3389/fmed.2023.1277535
Journal volume & issue
Vol. 10

Abstract

Read online

BackgroundTesticular volume (TV) is an essential parameter for monitoring testicular functions and pathologies. Nevertheless, current measurement tools, including orchidometers and ultrasonography, encounter challenges in obtaining accurate and personalized TV measurements.PurposeBased on magnetic resonance imaging (MRI), this study aimed to establish a deep learning model and evaluate its efficacy in segmenting the testes and measuring TV.Materials and methodsThe study cohort consisted of retrospectively collected patient data (N = 200) and a prospectively collected dataset comprising 10 healthy volunteers. The retrospective dataset was divided into training and independent validation sets, with an 8:2 random distribution. Each of the 10 healthy volunteers underwent 5 scans (forming the testing dataset) to evaluate the measurement reproducibility. A ResUNet algorithm was applied to segment the testes. Volume of each testis was calculated by multiplying the voxel volume by the number of voxels. Manually determined masks by experts were used as ground truth to assess the performance of the deep learning model.ResultsThe deep learning model achieved a mean Dice score of 0.926 ± 0.034 (0.921 ± 0.026 for the left testis and 0.926 ± 0.034 for the right testis) in the validation cohort and a mean Dice score of 0.922 ± 0.02 (0.931 ± 0.019 for the left testis and 0.932 ± 0.022 for the right testis) in the testing cohort. There was strong correlation between the manual and automated TV (R2 ranging from 0.974 to 0.987 in the validation cohort; R2 ranging from 0.936 to 0.973 in the testing cohort). The volume differences between the manual and automated measurements were 0.838 ± 0.991 (0.209 ± 0.665 for LTV and 0.630 ± 0.728 for RTV) in the validation cohort and 0.815 ± 0.824 (0.303 ± 0.664 for LTV and 0.511 ± 0.444 for RTV) in the testing cohort. Additionally, the deep-learning model exhibited excellent reproducibility (intraclass correlation >0.9) in determining TV.ConclusionThe MRI-based deep learning model is an accurate and reliable tool for measuring TV.

Keywords