Digital Health (Oct 2024)

Evaluating the comprehension and accuracy of ChatGPT's responses to diabetes-related questions in Urdu compared to English

  • Seyreen Faisal,
  • Tafiya Erum Kamran,
  • Rimsha Khalid,
  • Zaira Haider,
  • Yusra Siddiqui,
  • Nadia Saeed,
  • Sunaina Imran,
  • Romaan Faisal,
  • Misbah Jabeen

DOI
https://doi.org/10.1177/20552076241289730
Journal volume & issue
Vol. 10

Abstract

Read online

Introduction Patients with diabetes require healthcare and information that are accurate and extensive. Large language models (LLMs) like ChatGPT herald the capacity to provide such exhaustive data. To determine (a) the comprehensiveness of ChatGPT's responses in Urdu to diabetes-related questions and (b) the accuracy of ChatGPT's Urdu responses when compared to its English responses. Methods A cross-sectional observational study was conducted. Two reviewers experienced in internal medicine and endocrinology graded 53 Urdu and English responses on diabetes knowledge, lifestyle, and prevention. A senior reviewer resolved discrepancies. Responses were assessed for comprehension and accuracy, then compared to English. Results Among the Urdu responses generated, only two of 53 (3.8%) questions were graded as comprehensive, and five of 53 (9.4%) were graded as correct but inadequate. We found that 25 of 53 (47.2%) questions were graded as mixed with correct and incorrect/outdated data, the most significant proportion of responses being graded as such. When considering the comparison of response scale grading the comparative accuracy of Urdu and English responses, no Urdu response (0.0%) was considered to have more accuracy than English. Most of the Urdu responses were found to have an accuracy less than that of English, an overwhelming majority of 49 of 53 (92.5%) responses. Conclusion We found that although the ability to retrieve such information about diabetes is impressive, it can merely be used as an adjunct instead of a solitary source of information. Further work must be done to optimize Urdu responses in medical contexts to approximate the boundless potential it heralds.