Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT

Itrat Batool; Nighat Naved; Syed Murtaza Raza Kazmi; Fahad Umer

doi:10.1038/s41405-024-00226-3

BDJ Open (Jun 2024)

Leveraging Large Language Models in the delivery of post-operative dental care: a comparison between an embedded GPT model and ChatGPT

Itrat Batool,
Nighat Naved,
Syed Murtaza Raza Kazmi,
Fahad Umer

Affiliations

Itrat Batool: Section of Dentistry, Department of Surgery, Aga Khan University Hospital
Nighat Naved: Section of Dentistry, Department of Surgery, Aga Khan University Hospital
Syed Murtaza Raza Kazmi: Section of Dentistry, Department of Surgery, Aga Khan University Hospital
Fahad Umer: Section of Dentistry, Department of Surgery, Aga Khan University Hospital

DOI: https://doi.org/10.1038/s41405-024-00226-3
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 7

Abstract

Read online

Abstract Objective This study underscores the transformative role of Artificial Intelligence (AI) in healthcare, particularly the promising applications of Large Language Models (LLMs) in the delivery of post-operative dental care. The aim is to evaluate the performance of an embedded GPT model and its comparison with ChatGPT-3.5 turbo. The assessment focuses on aspects like response accuracy, clarity, relevance, and up-to-date knowledge in addressing patient concerns and facilitating informed decision-making. Material and methods An embedded GPT model, employing GPT-3.5-16k, was crafted via GPT-trainer to answer postoperative questions in four dental specialties including Operative Dentistry & Endodontics, Periodontics, Oral & Maxillofacial Surgery, and Prosthodontics. The generated responses were validated by thirty-six dental experts, nine from each specialty, employing a Likert scale, providing comprehensive insights into the embedded GPT model’s performance and its comparison with GPT3.5 turbo. For content validation, a quantitative Content Validity Index (CVI) was used. The CVI was calculated both at the item level (I-CVI) and scale level (S-CVI/Ave). To adjust I-CVI for chance agreement, a modified kappa statistic (K*) was computed. Results The overall content validity of responses generated via embedded GPT model and ChatGPT was 65.62% and 61.87% respectively. Moreover, the embedded GPT model revealed a superior performance surpassing ChatGPT with an accuracy of 62.5% and clarity of 72.5%. In contrast, the responses generated via ChatGPT achieved slightly lower scores, with an accuracy of 52.5% and clarity of 67.5%. However, both models performed equally well in terms of relevance and up-to-date knowledge. Conclusion In conclusion, embedded GPT model showed better results as compared to ChatGPT in providing post-operative dental care emphasizing the benefits of embedding and prompt engineering, paving the way for future advancements in healthcare applications.

Published in BDJ Open

ISSN: 2056-807X (Online)
Publisher: Nature Publishing Group
Country of publisher: United Kingdom
LCC subjects: Medicine: Dentistry
Website: https://www.nature.com/bdjopen/

About the journal