Abstractive summarization model considering hybrid lexical features

Yuehua JIANG; Lei DING; Jiaoe LI; Haoxuan DU; Kai GAO

doi:10.7535/hbkd.2019yx02009

Journal of Hebei University of Science and Technology (Apr 2019)

Abstractive summarization model considering hybrid lexical features

Yuehua JIANG,
Lei DING,
Jiaoe LI,
Haoxuan DU,
Kai GAO

Affiliations

Yuehua JIANG: School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, China
Lei DING: Information Center of Shijiazhuang Public Security Bureau, Shijiazhuang, Hebei 050021, China
Jiaoe LI: School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, China
Haoxuan DU: Xi'dian University, Xi'an, Shaanxi 710126, China
Kai GAO: School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, Hebei 050018, China

DOI: https://doi.org/10.7535/hbkd.2019yx02009
Journal volume & issue: Vol. 40, no. 2
pp. 152 – 158

Abstract

Read online

In order to use lexical features (including n-gram and part of speech information) to identify more key vocabulary content in the summarization generation process to further improve the quality of the summarization, an algorithm based on sequence-to-sequence (Seq2Seq) structure and attention mechanism and combining lexical features is proposed. The input layer of the algorithm combines the part of speech vector with the word vector, which is the input of the encoder layer. The encoder layer is composed of bi-directional LSTM, and the context vector is composed of the output of the encoder and the lexical feature vector extracted from the convolution neural network. The convolutional neural network layer in the model controls the lexical information, the bi-directional LSTM controls the sentence information, and the decoder layer uses unidirectional LSTM to decode the context vector and generates the summarization. The experiments on public dataset and the self-collected dataset show that the performance of the summarization generation model considering lexical feature is better than that of the contrast model. The ROUGE-1, ROUGE-2 and ROUGE-L scores on the public dataset are improved by 0.024, 0.033 and 0.030, respectively. Therefore, the generation of summarization is not only related to the semantics and themes of the article, but also to the lexical features.The proposed model provides a certain reference value in the research of generating summarization of integrating key infromation.

Published in Journal of Hebei University of Science and Technology

ISSN: 1008-1542 (Print)
Publisher: Hebei University of Science and Technology
Country of publisher: China
LCC subjects: Technology
Website: http://xuebao.hebust.edu.cn/hbkjdxen/ch/index.aspx

About the journal

Abstract

Keywords