Evaluation of the ability of large language models to self-diagnose oral diseases
Shiyang Zhuang,
Yuanhao Zeng,
Shaojunjie Lin,
Xirui Chen,
Yishan Xin,
Hongyan Li,
Yiming Lin,
Chaofan Zhang,
Yunzhi Lin
Affiliations
Shiyang Zhuang
Department of Stomatology, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Department of Stomatology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China; School of Stomatology, Fujian Medical University, Fuzhou 350212, China
Yuanhao Zeng
School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications, Beijing 100876, China
Shaojunjie Lin
School of Stomatology, Fujian Medical University, Fuzhou 350212, China
Xirui Chen
School of Stomatology, Fujian Medical University, Fuzhou 350212, China
Yishan Xin
Department of Orthopaedic Surgery, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Department of Orthopaedic Surgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China; Fujian Provincial Institute of Orthopedics, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Fujian Orthopedic Bone and Joint Disease and Sports Rehabilitation Clinical Medical Research Center, Fuzhou 350212, China
Hongyan Li
Department of Orthopaedic Surgery, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Department of Orthopaedic Surgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China; Fujian Provincial Institute of Orthopedics, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Fujian Orthopedic Bone and Joint Disease and Sports Rehabilitation Clinical Medical Research Center, Fuzhou 350212, China
Yiming Lin
Department of Orthopaedic Surgery, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Department of Orthopaedic Surgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China; Fujian Provincial Institute of Orthopedics, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Fujian Orthopedic Bone and Joint Disease and Sports Rehabilitation Clinical Medical Research Center, Fuzhou 350212, China
Chaofan Zhang
Department of Orthopaedic Surgery, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Department of Orthopaedic Surgery, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China; Fujian Provincial Institute of Orthopedics, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Fujian Orthopedic Bone and Joint Disease and Sports Rehabilitation Clinical Medical Research Center, Fuzhou 350212, China; Corresponding author
Yunzhi Lin
Department of Stomatology, the First Affiliated Hospital, Fujian Medical University, Fuzhou 350005, China; Department of Stomatology, National Regional Medical Center, Binhai Campus of the First Affiliated Hospital, Fujian Medical University, Fuzhou 350212, China; Corresponding author
Summary: Large language models (LLMs) offer potential in primary dental care. We conducted an evaluation of LLMs’ diagnostic capabilities across various oral diseases and contexts. All LLMs showed diagnostic capabilities for temporomandibular joint disorders, periodontal disease, dental caries, and malocclusion. The prompts did not affect the performance of ChatGPT 3.5. When Chinese was used, the diagnostic ability of ChatGPT 3.5 for pulpitis improved (0% vs. 61.7%, p < 0.001), while the ability to diagnose pericoronitis decreased (8% vs. 0%, p < 0.001). For ChatGPT 4.0 in Chinese, they were both improved (0% vs. 92%, 8% vs. 72%, p < 0.001, respectively). Claude 2 exhibited the highest accuracy in diagnosing pulpitis (36%, p = 0.048), ChatGPT 4.0 showed complete diagnostic capability for pericoronitis. Llama 2 and Claude 3.5 Sonnet exhibited complete diagnostic capability for oral cancer. In conclusion, LLMs may be a potential tool for daily dental care but need further updates.