Evaluating the comprehension and accuracy of ChatGPT's responses to diabetes-related questions in Urdu compared to English

Seyreen Faisal; Tafiya Erum Kamran; Rimsha Khalid; Zaira Haider; Yusra Siddiqui; Nadia Saeed; Sunaina Imran; Romaan Faisal; Misbah Jabeen

doi:10.1177/20552076241289730

Digital Health (Oct 2024)

Evaluating the comprehension and accuracy of ChatGPT's responses to diabetes-related questions in Urdu compared to English

Seyreen Faisal,
Tafiya Erum Kamran,
Rimsha Khalid,
Zaira Haider,
Yusra Siddiqui,
Nadia Saeed,
Sunaina Imran,
Romaan Faisal,
Misbah Jabeen

Affiliations

Seyreen Faisal: , Shifa Tameer-e-Millat University, Islamabad, Pakistan
Tafiya Erum Kamran: , Shifa Tameer-e-Millat University, Islamabad, Pakistan
Rimsha Khalid: , Shifa Tameer-e-Millat University, Islamabad, Pakistan
Zaira Haider: , Shifa Tameer-e-Millat University, Islamabad, Pakistan
Yusra Siddiqui: , Shifa Tameer-e-Millat University, Islamabad, Pakistan
Nadia Saeed: Department of Internal Medicine, , Shifa Tameer-e-Millat University, Islamabad, Pakistan
Sunaina Imran: , Shifa Tameer-e-Millat University, Islamabad, Pakistan
Romaan Faisal: , Shaheed Zulfiqar Ali Bhutto Medical University, Islamabad, Pakistan
Misbah Jabeen: Department of Endocrinology, Shifa International Hospital, Islamabad, Pakistan

DOI: https://doi.org/10.1177/20552076241289730
Journal volume & issue: Vol. 10

Abstract

Read online

Introduction Patients with diabetes require healthcare and information that are accurate and extensive. Large language models (LLMs) like ChatGPT herald the capacity to provide such exhaustive data. To determine (a) the comprehensiveness of ChatGPT's responses in Urdu to diabetes-related questions and (b) the accuracy of ChatGPT's Urdu responses when compared to its English responses. Methods A cross-sectional observational study was conducted. Two reviewers experienced in internal medicine and endocrinology graded 53 Urdu and English responses on diabetes knowledge, lifestyle, and prevention. A senior reviewer resolved discrepancies. Responses were assessed for comprehension and accuracy, then compared to English. Results Among the Urdu responses generated, only two of 53 (3.8%) questions were graded as comprehensive, and five of 53 (9.4%) were graded as correct but inadequate. We found that 25 of 53 (47.2%) questions were graded as mixed with correct and incorrect/outdated data, the most significant proportion of responses being graded as such. When considering the comparison of response scale grading the comparative accuracy of Urdu and English responses, no Urdu response (0.0%) was considered to have more accuracy than English. Most of the Urdu responses were found to have an accuracy less than that of English, an overwhelming majority of 49 of 53 (92.5%) responses. Conclusion We found that although the ability to retrieve such information about diabetes is impressive, it can merely be used as an adjunct instead of a solitary source of information. Further work must be done to optimize Urdu responses in medical contexts to approximate the boundless potential it heralds.

Published in Digital Health

ISSN: 2055-2076 (Online)
Publisher: SAGE Publishing
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://journals.sagepub.com/home/dhj

About the journal