Alexandria Engineering Journal (Nov 2024)
A multi-scale embedding network for unified named entity recognition in Chinese Electronic Medical Records
Abstract
Named Entity Recognition (NER) in Chinese Electronic Medical Records (EMRs) is crucial for enhancing healthcare quality and efficiency. However, the unique complexity of the Chinese language and the unstructured format of medical texts create significant challenges. To address these issues, we propose MSCNER, a unified Multi-Scale Embedding Network designed specifically for NER in Chinese EMRs. MSCNER navigates linguistic and contextual challenges by employing a character relation classification scheme. The model first extracts detailed contextual information through an information extraction module and a context modeler. It then incorporates multi-scale feature extraction to gather comprehensive features across characters, words, and positions. Additionally, a weight allocation module based on an attention mechanism optimizes the recognition of complex and discontinuous entities. Experimental results on three benchmark Chinese EMR datasets demonstrate that MSCNER achieves state-of-the-art performance. It significantly surpasses existing models in terms of accuracy and reliability. These findings underscore the potential of MSCNER to improve NER in medical applications, paving the way for more effective and scalable healthcare data systems and broader applications in other language processing tasks.