IEEE Access (Jan 2020)

Information Extraction for Intestinal Cancer Electronic Medical Records

  • Sufen Wang,
  • Minmin Pang,
  • Changqing Pan,
  • Junyi Yuan,
  • Bo Xu,
  • Ming Du,
  • Hong Zhang

DOI
https://doi.org/10.1109/ACCESS.2020.3005684
Journal volume & issue
Vol. 8
pp. 125923 – 125934

Abstract

Read online

The data generated by the structured electronic medical records is helpful for mining and extracting medical data, and it is an effective way to make effective use of valuable data resources. However, the hospitals have accumulated a large number of unstructured data in electronic medical records, which cannot be effectively searched, resulting in serious waste of resources. In this paper, we study the problem of extracting attribute values from the unstructured text in electronic medical records. By observing intestinal cancer diagnostic texts, our attributes have two categories - discriminative attributes and extractive attributes, which use the text classification and the sequence labeling to tackle attribute values extraction problems. For discriminative attributes, we firstly divide the text into sentences/segments as instances. Secondly, we fine-tune the pre-trained word embedding to capture domain-specific semantics/knowledge. Thirdly, we also use an attention mechanism to select the most important instance for different attribute extractors. Finally, multi-tasking learning is used to share useful information to get better experimental results. For extractive attributes, we propose a novel model to get attribute values, including the BiLSTM layer, the CNN layer and the CRF layer. In particular, we use BiLSTM and CNN to learn text features and CRF as the last layer of the model. Experiments have shown that our method is superior to several competitive baseline methods.

Keywords