North American Spine Society Journal (Dec 2024)
External validation of SpineNetV2 on a comprehensive set of radiological features for grading lumbosacral disc pathologies
Abstract
ABSTRACT: Background: In recent years, the integration of Artificial Intelligence (AI) models has revolutionized the diagnosis of Low Back Pain (LBP) and associated disc pathologies. Among these, SpineNetV2 stands out as a state-of-the-art, open-access model for detecting and grading various intervertebral disc pathologies. However, ensuring the reliability and applicability of AI models like SpineNetV2 is paramount. Rigorous validation is essential to guarantee their robustness and generalizability across diverse patient cohorts and imaging protocols. Methods: We conducted a retrospective analysis of MRI images of 1747 lumbosacral intervertebral discs (IVDs) from 353 patients (mean age, 54 ± 15.4 years, 44.5% female) with various spinal disorders, collected between September 2021 and February 2023 at X-Ray Service s.r.l. The SpineNetV2 system was used to grade 11 distinct lumbosacral disc pathologies, including Pfirrmann grading, disc narrowing, central canal stenosis, spondylolisthesis, (upper and lower) endplate defects, (upper and lower) marrow changes, (right and left) foraminal stenosis, and disc herniation, using T2-weighted sagittal MR images. Performance metrics included accuracy, balanced accuracy, precision, F1 score, Matthew's Correlation Coefficient, Brier Score Loss, Lin's concordance correlation coefficients, and Cohen's kappa coefficients. Two expert radiologists provide annotations for these discs. The evaluation of SpineNetV2′s grading is compared against expert radiologists' assessments. Results: SpineNetV2 demonstrated strong performance across various metrics, with high agreement scores (Cohen's Kappa, Lin's Concordance, and Matthew's Correlation Coefficient exceeding 0.7) for most pathologies. However, lower agreement was found for foraminal stenosis and disc herniation, underscoring the limitations of sagittal MR images for evaluating these conditions. Conclusions: This study highlights the importance of external validation, emphasizing the need for comprehensive assessments of deep learning models. SpineNetV2 exhibits promising results in predicting disc pathologies, with findings guiding further improvements. The open-source release of SpineNetV2 enables researchers to independently validate and extend the model's capabilities. This collaborative approach promotes innovation and accelerates the development of more reliable and comprehensive deep learning tools for the assessment of spine pathology.