Journal of King Saud University: Computer and Information Sciences (Sep 2023)

GWBNER: A named entity recognition method based on character glyph and word boundary features for Chinese EHRs

  • Jinsong Zhang,
  • Xiaomei Yu,
  • Zhichao Wang,
  • Xiangwei Zheng

Journal volume & issue
Vol. 35, no. 8
p. 101654

Abstract

Read online

Electronic Health Records (EHRs) contain unprecedented volumes of data related to health such as diagnosis and treatment information. Based on the technologies of named entity recognition (NER), EHRs mining has become a research focus in health domain. Nevertheless, the complex medical entities are challenging to be recognized, especially in Chinese EHRs. In this paper, a Glyph and Word Boundary-based Named Entity Recognition (GWBNER) method is presented, which takes into account both the Chinese character glyph and word boundary features in Chinese EHRs. Specifically, the character glyphs are utilized to capture the character-level global structural features, and the word boundaries are adopted to extract the word-level local structural features in Chinese. Therefore, the medical entities are fully recognized with rich semantic features from diverse perspectives in Chinese EHR texts. Finally, we conduct extensive experiments to evaluate the novel approach, and the F1 scores of 0.879, 0.998 and 0.962 are achieved on three EHR datasets, respectively. The experimental results demonstrate the optimal performance of the GWBNER method compared with the state-of-the-art models in NER community.

Keywords