Discover Oncology (Feb 2025)

A preliminary investigation into the potential, pitfalls, and limitations of large language models for mammography interpretation

  • Filippo Pesapane,
  • Luca Nicosia,
  • Anna Rotili,
  • Silvia Penco,
  • Valeria Dominelli,
  • Chiara Trentin,
  • Federica Ferrari,
  • Giulia Signorelli,
  • Serena Carriero,
  • Enrico Cassano

DOI
https://doi.org/10.1007/s12672-025-02005-4
Journal volume & issue
Vol. 16, no. 1
pp. 1 – 5

Abstract

Read online

Abstract This study evaluates the capabilities of large language models, specifically GPT-4, in interpreting mammographic images. The analysis involved 120 mammographic images equally divided between cases with and without mammography’s findings. Without additional context, the LLM was tasked to generate reports based solely on these images. GPT-4 correctly identified mammographic projections in 53.3% of cases and showed varying degrees of accuracy in identifying microcalcifications and masses. The study highlighted GPT-4’s embryonic interpretative abilities with a sensitivity of 50.0% and specificity of 37.5%. However, a significant rate of false positives and false negatives, along with hallucinations, underscored the model's limitations. This exploratory test offers insights into the potential and risks of using LLMs in mammography interpretation, also underscoring the need for dedicated training, validation, and regulation of AI tools in healthcare to ensure their reliability and safety in clinical practice.