Jisuanji kexue yu tansuo (Feb 2021)

Document Classification Method Based on Context Awareness and Hierarchical Attention Network

  • REN Jianhua, LI Jing, MENG Xiangfu

DOI
https://doi.org/10.3778/j.issn.1673-9418.1912048
Journal volume & issue
Vol. 15, no. 2
pp. 305 – 314

Abstract

Read online

Document classification is a basic problem in the field of natural language processing (NLP). In recent years, although hierarchical attention networks have made progress, because each sentence is coded independently, bidirectional encoder used in the model can only consider the adjacent sentence of the coded sentence, still focuses on the currently encoded sentences, and does not effectively integrate document structure knowledge into the archi-tecture. To solve this problem, document classification method based on context awareness and hierarchical atten-tion network (CAHAN) is proposed. This method uses a hierarchical structure to represent the hierarchical structure of the document, and uses the attention mechanism to consider the important sentences in the document and the important word factors in the sentence. At the word level and sentence level, it not only relies on the bidirectional encoder to obtain context information, but also introduces the context vector in the word-level attention mechanism to make the word-level encoder make attention decisions based on the context information to fully obtain the context information of the text, thereby extracting the depth document characteristics. In addition, the gating mechanism is used to accurately determine how much context information should be considered. The experimental results on two standard data sets show that the proposed CAHAN model has better classification effects than long short-term memory (LSTM), convolutional neural networks (CNN), and hierarchical attention network (HAN), which can improve the accuracy of document classification tasks.

Keywords