Principle-Based Approach for the De-Identification of Code-Mixed Electronic Health Records

Chen-Kai Wang; Feng-Duo Wang; You-Qian Lee; Pei-Tsz Chen; Bo-Hong Wang; Chu-Hsien Su; Joseph Chin-Chi Kuo; Chi-Shin Wu; Yi-Ling Chien; Hong-Jie Dai; Vincent S. Tseng; Wen-Lian Hsu

doi:10.1109/ACCESS.2022.3148396

IEEE Access (Jan 2022)

Principle-Based Approach for the De-Identification of Code-Mixed Electronic Health Records

Chen-Kai Wang,
Feng-Duo Wang,
You-Qian Lee,
Pei-Tsz Chen,
Bo-Hong Wang,
Chu-Hsien Su,
Joseph Chin-Chi Kuo,
Chi-Shin Wu,
Yi-Ling Chien,
Hong-Jie Dai,
Vincent S. Tseng,
Wen-Lian Hsu

Affiliations

Chen-Kai Wang: ORCiD; Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Feng-Duo Wang: Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan
You-Qian Lee: Intelligent System Laboratory, Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
Pei-Tsz Chen: Department of Chemical Engineering, Feng Chia University, Taichung, Taiwan
Bo-Hong Wang: Intelligent System Laboratory, Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
Chu-Hsien Su: National Center for Geriatrics and Welfare Research, National Health Research Institutes, Miaoli, Taiwan
Joseph Chin-Chi Kuo: Big Data Center, China Medical University Hospital, China Medical University, Taichung, Taiwan
Chi-Shin Wu: National Center for Geriatrics and Welfare Research, National Health Research Institutes, Miaoli, Taiwan
Yi-Ling Chien: Department of Psychiatry, College of Medicine, National Taiwan University Hospital, Taipei, Taiwan
Hong-Jie Dai: ORCiD; Intelligent System Laboratory, Department of Electrical Engineering, College of Electrical Engineering and Computer Science, National Kaohsiung University of Science and Technology, Kaohsiung, Taiwan
Vincent S. Tseng: ORCiD; Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu, Taiwan
Wen-Lian Hsu: Department of Computer Science and Information Engineering, College of Information and Electrical Engineering, Asia University, Taichung, Taiwan

DOI: https://doi.org/10.1109/ACCESS.2022.3148396
Journal volume & issue: Vol. 10
pp. 22875 – 22885

Abstract

Read online

Code-mixing is a phenomenon where at least two languages are combined in a hybrid manner in the context of a single conversation. The use of mixed language is widespread in multilingual and multicultural countries and poses significant challenges for the development of automated language processing tools. In Taiwan’s electronic health record (EHR) systems, unstructured EHR texts are usually represented in a mixture of English and Chinese which increases the difficulty for de-identification and synthetization of protected health information (PHI). We explored this problem by applying several state-of-the-art pre-trained mono- and multilingual language models and propose to exploit the principle-based approach (PBA) for the tasks of PHI recognition and resynthesis on a code-mixed EHR corpus annotated with 6 main categories and 25 subcategories of PHIs. A hierarchical principle slot schema is defined in the PBA to encode knowledge of code-mixed PHIs and utilize slots to learn from the training set to assemble principles for recognizing PHI mentions and synthesizing surrogates simultaneously. In addition, a semantic disambiguation process is implemented to disambiguate ambiguous PHI categories in the de-identification process and to dynamically extend the knowledge encoded in PBA during the knowledge augmentation process. The experiment results demonstrate that the proposed method can achieve the best micro- and macro-F-scores in comparison to the other mono- and multilingual language models fine-tuned on our code-mixed corpus.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords