JMIR Medical Informatics (Nov 2024)

Exploring the Potential of Claude 3 Opus in Renal Pathological Diagnosis: Performance Evaluation

  • Xingyuan Li,
  • Ke Liu,
  • Yanlin Lang,
  • Zhonglin Chai,
  • Fang Liu

DOI
https://doi.org/10.2196/65033
Journal volume & issue
Vol. 12
p. e65033

Abstract

Read online

BackgroundArtificial intelligence (AI) has shown great promise in assisting medical diagnosis, but its application in renal pathology remains limited. ObjectiveWe evaluated the performance of an advanced AI language model, Claude 3 Opus (Anthropic), in generating diagnostic descriptions for renal pathological images. MethodsWe carefully curated a dataset of 100 representative renal pathological images from the Diagnostic Atlas of Renal Pathology (3rd edition). The image selection aimed to cover a wide spectrum of common renal diseases, ensuring a balanced and comprehensive dataset. Claude 3 Opus generated diagnostic descriptions for each image, which were scored by 2 pathologists on clinical relevance, accuracy, fluency, completeness, and overall value. ResultsClaude 3 Opus achieved a high mean score in language fluency (3.86) but lower scores in clinical relevance (1.75), accuracy (1.55), completeness (2.01), and overall value (1.75). Performance varied across disease types. Interrater agreement was substantial for relevance (κ=0.627) and overall value (κ=0.589) and moderate for accuracy (κ=0.485) and completeness (κ=0.458). ConclusionsClaude 3 Opus shows potential in generating fluent renal pathology descriptions but needs improvement in accuracy and clinical value. The AI’s performance varied across disease types. Addressing the limitations of single-source data and incorporating comparative analyses with other AI approaches are essential steps for future research. Further optimization and validation are needed for clinical applications.