Diagnostics (Apr 2024)

Artificial Intelligence in Medical Imaging: Analyzing the Performance of ChatGPT and Microsoft Bing in Scoliosis Detection and Cobb Angle Assessment

  • Artur Fabijan,
  • Agnieszka Zawadzka-Fabijan,
  • Robert Fabijan,
  • Krzysztof Zakrzewski,
  • Emilia Nowosławska,
  • Bartosz Polis

DOI
https://doi.org/10.3390/diagnostics14070773
Journal volume & issue
Vol. 14, no. 7
p. 773

Abstract

Read online

Open-source artificial intelligence models (OSAIM) find free applications in various industries, including information technology and medicine. Their clinical potential, especially in supporting diagnosis and therapy, is the subject of increasingly intensive research. Due to the growing interest in artificial intelligence (AI) for diagnostic purposes, we conducted a study evaluating the capabilities of AI models, including ChatGPT and Microsoft Bing, in the diagnosis of single-curve scoliosis based on posturographic radiological images. Two independent neurosurgeons assessed the degree of spinal deformation, selecting 23 cases of severe single-curve scoliosis. Each posturographic image was separately implemented onto each of the mentioned platforms using a set of formulated questions, starting from ‘What do you see in the image?’ and ending with a request to determine the Cobb angle. In the responses, we focused on how these AI models identify and interpret spinal deformations and how accurately they recognize the direction and type of scoliosis as well as vertebral rotation. The Intraclass Correlation Coefficient (ICC) with a ‘two-way’ model was used to assess the consistency of Cobb angle measurements, and its confidence intervals were determined using the F test. Differences in Cobb angle measurements between human assessments and the AI ChatGPT model were analyzed using metrics such as RMSEA, MSE, MPE, MAE, RMSLE, and MAPE, allowing for a comprehensive assessment of AI model performance from various statistical perspectives. The ChatGPT model achieved 100% effectiveness in detecting scoliosis in X-ray images, while the Bing model did not detect any scoliosis. However, ChatGPT had limited effectiveness (43.5%) in assessing Cobb angles, showing significant inaccuracy and discrepancy compared to human assessments. This model also had limited accuracy in determining the direction of spinal curvature, classifying the type of scoliosis, and detecting vertebral rotation. Overall, although ChatGPT demonstrated potential in detecting scoliosis, its abilities in assessing Cobb angles and other parameters were limited and inconsistent with expert assessments. These results underscore the need for comprehensive improvement of AI algorithms, including broader training with diverse X-ray images and advanced image processing techniques, before they can be considered as auxiliary in diagnosing scoliosis by specialists.

Keywords