Diagnostics (Mar 2024)

Accuracy of Treatment Recommendations by Pragmatic Evidence Search and Artificial Intelligence: An Exploratory Study

  • Zunaira Baig,
  • Daniel Lawrence,
  • Mahen Ganhewa,
  • Nicola Cirillo

DOI
https://doi.org/10.3390/diagnostics14050527
Journal volume & issue
Vol. 14, no. 5
p. 527

Abstract

Read online

There is extensive literature emerging in the field of dentistry with the aim to optimize clinical practice. Evidence-based guidelines (EBGs) are designed to collate diagnostic criteria and clinical treatment for a range of conditions based on high-quality evidence. Recently, advancements in Artificial Intelligence (AI) have instigated further queries into its applicability and integration into dentistry. Hence, the aim of this study was to develop a model that can be used to assess the accuracy of treatment recommendations for dental conditions generated by individual clinicians and the outcomes of AI outputs. For this pilot study, a Delphi panel of six experts led by CoTreat AI provided the definition and developed evidence-based recommendations for subgingival and supragingival calculus. For the rapid review—a pragmatic approach that aims to rapidly assess the evidence base using a systematic methodology—the Ovid Medline database was searched for subgingival and supragingival calculus. Studies were selected and reported based on the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA), and this study complied with the minimum requirements for completing a restricted systematic review. Treatment recommendations were also searched for these same conditions in ChatGPT (version 3.5 and 4) and Bard (now Gemini). Adherence to the recommendations of the standard was assessed using qualitative content analysis and agreement scores for interrater reliability. Treatment recommendations by AI programs generally aligned with the current literature, with an agreement of up to 75%, although data sources were not provided by these tools, except for Bard. The clinician’s rapid review results suggested several procedures that may increase the likelihood of overtreatment, as did GPT4. In terms of overall accuracy, GPT4 outperformed all other tools, including rapid review (Cohen’s kappa 0.42 vs. 0.28). In summary, this study provides preliminary observations for the suitability of different evidence-generating methods to inform clinical dental practice.

Keywords