Wanet: weight and attention network for video summarization

Arpan Basu; Rishav Pramanik; Ram Sarkar

doi:10.1007/s44163-024-00101-y

Discover Artificial Intelligence (Jan 2024)

Wanet: weight and attention network for video summarization

Arpan Basu,
Rishav Pramanik,
Ram Sarkar

Affiliations

Arpan Basu: Department of Computer Science and Engineering, Jadavpur University
Rishav Pramanik: Department of Computer Science and Engineering, Jadavpur University
Ram Sarkar: Department of Computer Science and Engineering, Jadavpur University

DOI: https://doi.org/10.1007/s44163-024-00101-y
Journal volume & issue: Vol. 4, no. 1
pp. 1 – 13

Abstract

Read online

Abstract In this paper, we propose a deep learning-based model, called Weight and Attention Network (WANet), for video summarization. The network comprises a simple multi-head attention mechanism, followed by a feed-forward network to obtain the frame importance scores. Summary keyshots are obtained from the scores using a combination of kernel temporal segmentation and the knapsack algorithm. Contrary to past methods, we first enrich the input frames with similar information as opposed to letting the model learn all the features by itself. A novel weight assignment mechanism is introduced to assign weights to the input frames based on their similarity before passing the same to the model. Experimental results on the SumMe and TVSum datasets indicate the effectiveness of the present method when compared to state-of-the-art methods applied to the same datasets.

Published in Discover Artificial Intelligence

ISSN: 2731-0809 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44163

About the journal

Abstract

Keywords