Long Length Document Classification by Local Convolutional Feature Aggregation

Liu Liu; Kaile Liu; Zhenghai Cong; Jiali Zhao; Yefei Ji; Jun He

doi:10.3390/a11080109

Algorithms (Jul 2018)

Long Length Document Classification by Local Convolutional Feature Aggregation

Liu Liu,
Kaile Liu,
Zhenghai Cong,
Jiali Zhao,
Yefei Ji,
Jun He

Affiliations

Liu Liu: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China
Kaile Liu: State Grid Corporation of China, Beijing 100031, China
Zhenghai Cong: NARI Group Corporation of China/State Grid Electric Power Research Institute, Nanjing 211106, China
Jiali Zhao: State Grid Corporation of China, Beijing 100031, China
Yefei Ji: NARI Group Corporation of China/State Grid Electric Power Research Institute, Nanjing 211106, China
Jun He: School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, Nanjing 210044, China

DOI: https://doi.org/10.3390/a11080109
Journal volume & issue: Vol. 11, no. 8
p. 109

Abstract

Read online

The exponential increase in online reviews and recommendations makes document classification and sentiment analysis a hot topic in academic and industrial research. Traditional deep learning based document classification methods require the use of full textual information to extract features. In this paper, in order to tackle long document, we proposed three methods that use local convolutional feature aggregation to implement document classification. The first proposed method randomly draws blocks of continuous words in the full document. Each block is then fed into the convolution neural network to extract features and then are concatenated together to output the classification probability through a classifier. The second model improves the first by capturing the contextual order information of the sampled blocks with a recurrent neural network. The third model is inspired by the recurrent attention model (RAM), in which a reinforcement learning module is introduced to act as a controller for selecting the next block position based on the recurrent state. Experiments on our collected four-class arXiv paper dataset show that the three proposed models all perform well, and the RAM model achieves the best test accuracy with the least information.

Published in Algorithms

ISSN: 1999-4893 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Technology (General): Industrial engineering. Management engineering; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.mdpi.com/journal/algorithms

About the journal

Abstract

Keywords