An optimized hybrid deep learning model based on word embeddings and statistical features for extractive summarization

Yaser M. Wazery; Marwa E. Saleh; Abdelmgeid A. Ali

Journal of King Saud University: Computer and Information Sciences (Jul 2023)

An optimized hybrid deep learning model based on word embeddings and statistical features for extractive summarization

Yaser M. Wazery,
Marwa E. Saleh,
Abdelmgeid A. Ali

Affiliations

Yaser M. Wazery: Faculty of Computers and Information, Minia University, Minia, Egypt
Marwa E. Saleh: Corresponding author.; Faculty of Computers and Information, Minia University, Minia, Egypt
Abdelmgeid A. Ali: Faculty of Computers and Information, Minia University, Minia, Egypt

Journal volume & issue: Vol. 35, no. 7
p. 101614

Abstract

Read online

Extractive summarization has recently gained significant attention as a classification problem at the sentence level. Most current summarization methods rely on only one way of representing sentences in a document (i.e., extracted features, word embeddings, BERT embeddings). However, classification performance and summary generation quality will be improved if we combine two ways of representing sentences. This paper presents a novel extractive text summarization method based on word embeddings and statistical features of a single document. Each sentence is encoded using a Convolutional Neural Network (CNN) and a Feed-Forward Neural Network (FFNN) based on word embeddings and statistical features. CNN and FFNN outputs are concatenated to classify the sentence using a Multilayer Perceptron (MLP). In addition, hybrid model parameters are optimized by the KerasTuner optimization technique to determine the most efficient hybrid model. The proposed method was evaluated on the standard Newsroom dataset. Experiments show that the proposed method effectively captures the document’s semantic and statistical information and outperforms deep learning, machine learning, and state-of-the-art approaches with scores of 78.64, 74.05, and 72.08 for ROUGE-1 ROUGE-2, and ROUGE-L, respectively.

Published in Journal of King Saud University: Computer and Information Sciences

ISSN: 1319-1578 (Print)
Publisher: Elsevier
Country of publisher: Saudi Arabia
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.journals.elsevier.com/journal-of-king-saud-university-computer-and-information-sciences/

About the journal

Abstract

Keywords