Information Extraction for Intestinal Cancer Electronic Medical Records

Sufen Wang; Minmin Pang; Changqing Pan; Junyi Yuan; Bo Xu; Ming Du; Hong Zhang

doi:10.1109/ACCESS.2020.3005684

IEEE Access (Jan 2020)

Information Extraction for Intestinal Cancer Electronic Medical Records

Sufen Wang,
Minmin Pang,
Changqing Pan,
Junyi Yuan,
Bo Xu,
Ming Du,
Hong Zhang

Affiliations

Sufen Wang: Glorious Sun School of Business and Management, Donghua University, Shanghai, China
Minmin Pang: ORCiD; School of Computer Science and Technology, Donghua University, Shanghai, China
Changqing Pan: ORCiD; Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
Junyi Yuan: ORCiD; Shanghai Chest Hospital, Shanghai Jiao Tong University, Shanghai, China
Bo Xu: ORCiD; School of Computer Science and Technology, Donghua University, Shanghai, China
Ming Du: ORCiD; School of Computer Science and Technology, Donghua University, Shanghai, China
Hong Zhang: School of Computer Science and Technology, Donghua University, Shanghai, China

DOI: https://doi.org/10.1109/ACCESS.2020.3005684
Journal volume & issue: Vol. 8
pp. 125923 – 125934

Abstract

Read online

The data generated by the structured electronic medical records is helpful for mining and extracting medical data, and it is an effective way to make effective use of valuable data resources. However, the hospitals have accumulated a large number of unstructured data in electronic medical records, which cannot be effectively searched, resulting in serious waste of resources. In this paper, we study the problem of extracting attribute values from the unstructured text in electronic medical records. By observing intestinal cancer diagnostic texts, our attributes have two categories - discriminative attributes and extractive attributes, which use the text classification and the sequence labeling to tackle attribute values extraction problems. For discriminative attributes, we firstly divide the text into sentences/segments as instances. Secondly, we fine-tune the pre-trained word embedding to capture domain-specific semantics/knowledge. Thirdly, we also use an attention mechanism to select the most important instance for different attribute extractors. Finally, multi-tasking learning is used to share useful information to get better experimental results. For extractive attributes, we propose a novel model to get attribute values, including the BiLSTM layer, the CNN layer and the CRF layer. In particular, we use BiLSTM and CNN to learn text features and CRF as the last layer of the model. Experiments have shown that our method is superior to several competitive baseline methods.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords