Improving the quality of Persian clinical text with a novel spelling correction system

Seyed Mohammad Sadegh Dashti; Seyedeh Fatemeh Dashti

doi:10.1186/s12911-024-02613-0

BMC Medical Informatics and Decision Making (Aug 2024)

Improving the quality of Persian clinical text with a novel spelling correction system

Seyed Mohammad Sadegh Dashti,
Seyedeh Fatemeh Dashti

Affiliations

Seyed Mohammad Sadegh Dashti: Department of Computer Engineering, Kerman Branch, Islamic Azad University
Seyedeh Fatemeh Dashti: Department of Advanced Research, Bushehr University of Medical Sciences

DOI: https://doi.org/10.1186/s12911-024-02613-0
Journal volume & issue: Vol. 24, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Background The accuracy of spelling in Electronic Health Records (EHRs) is a critical factor for efficient clinical care, research, and ensuring patient safety. The Persian language, with its abundant vocabulary and complex characteristics, poses unique challenges for real-word error correction. This research aimed to develop an innovative approach for detecting and correcting spelling errors in Persian clinical text. Methods Our strategy employs a state-of-the-art pre-trained model that has been meticulously fine-tuned specifically for the task of spelling correction in the Persian clinical domain. This model is complemented by an innovative orthographic similarity matching algorithm, PERTO, which uses visual similarity of characters for ranking correction candidates. Results The evaluation of our approach demonstrated its robustness and precision in detecting and rectifying word errors in Persian clinical text. In terms of non-word error correction, our model achieved an F1-Score of 90.0% when the PERTO algorithm was employed. For real-word error detection, our model demonstrated its highest performance, achieving an F1-Score of 90.6%. Furthermore, the model reached its highest F1-Score of 91.5% for real-word error correction when the PERTO algorithm was employed. Conclusions Despite certain limitations, our method represents a substantial advancement in the field of spelling error detection and correction for Persian clinical text. By effectively addressing the unique challenges posed by the Persian language, our approach paves the way for more accurate and efficient clinical documentation, contributing to improved patient care and safety. Future research could explore its use in other areas of the Persian medical domain, enhancing its impact and utility.

Published in BMC Medical Informatics and Decision Making

ISSN: 1472-6947 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: http://bmcmedinformdecismak.biomedcentral.com

About the journal

Abstract

Keywords