Accuracy of Treatment Recommendations by Pragmatic Evidence Search and Artificial Intelligence: An Exploratory Study

Zunaira Baig; Daniel Lawrence; Mahen Ganhewa; Nicola Cirillo

doi:10.3390/diagnostics14050527

Diagnostics (Mar 2024)

Accuracy of Treatment Recommendations by Pragmatic Evidence Search and Artificial Intelligence: An Exploratory Study

Zunaira Baig,
Daniel Lawrence,
Mahen Ganhewa,
Nicola Cirillo

Affiliations

Zunaira Baig: Melbourne Dental School, The University of Melbourne, 720 Swanston Street, Carlton, VIC 3053, Australia
Daniel Lawrence: CoTreat Pty Ltd., Melbourne, VIC 3000, Australia
Mahen Ganhewa: CoTreat Pty Ltd., Melbourne, VIC 3000, Australia
Nicola Cirillo: Melbourne Dental School, The University of Melbourne, 720 Swanston Street, Carlton, VIC 3053, Australia

DOI: https://doi.org/10.3390/diagnostics14050527
Journal volume & issue: Vol. 14, no. 5
p. 527

Abstract

Read online

There is extensive literature emerging in the field of dentistry with the aim to optimize clinical practice. Evidence-based guidelines (EBGs) are designed to collate diagnostic criteria and clinical treatment for a range of conditions based on high-quality evidence. Recently, advancements in Artificial Intelligence (AI) have instigated further queries into its applicability and integration into dentistry. Hence, the aim of this study was to develop a model that can be used to assess the accuracy of treatment recommendations for dental conditions generated by individual clinicians and the outcomes of AI outputs. For this pilot study, a Delphi panel of six experts led by CoTreat AI provided the definition and developed evidence-based recommendations for subgingival and supragingival calculus. For the rapid review—a pragmatic approach that aims to rapidly assess the evidence base using a systematic methodology—the Ovid Medline database was searched for subgingival and supragingival calculus. Studies were selected and reported based on the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA), and this study complied with the minimum requirements for completing a restricted systematic review. Treatment recommendations were also searched for these same conditions in ChatGPT (version 3.5 and 4) and Bard (now Gemini). Adherence to the recommendations of the standard was assessed using qualitative content analysis and agreement scores for interrater reliability. Treatment recommendations by AI programs generally aligned with the current literature, with an agreement of up to 75%, although data sources were not provided by these tools, except for Bard. The clinician’s rapid review results suggested several procedures that may increase the likelihood of overtreatment, as did GPT4. In terms of overall accuracy, GPT4 outperformed all other tools, including rapid review (Cohen’s kappa 0.42 vs. 0.28). In summary, this study provides preliminary observations for the suitability of different evidence-generating methods to inform clinical dental practice.

Published in Diagnostics

ISSN: 2075-4418 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Medicine: Medicine (General)
Website: http://www.mdpi.com/journal/diagnostics

About the journal

Abstract

Keywords