IEEE Access (Jan 2020)
A Neural Network Architecture for Information Extraction in Chinese Drug Package Insert
Abstract
There is a lot of useful information in the medical photocopying materials. The correct extraction and identification of this information are of great significance for the construction of digital medical. In most previous research, researchers have been working on clinical data, and there is little discussion on the extraction of information from Chinese drug package insert. To settle this issue, a neural network model is proposed in this paper. This model uses OCR's post-document as the data source, which can not only correct these data but also classify sentences. It is mainly composed of three layers: the first layer is employed to correct the data using the language model and the seq2seq model, the second layer is defined by convolution neural network (CNN) aiming to enrich the processed sentences, and another layer is used to determine the label of each sentence. The quantitative experimental results verify the feasibility and validity of the proposed model. In addition, the comparing experiments demonstrate that our method outperforms the regular rule-based approaches, which indicated 4%-6% higher in F1 score.
Keywords