Video Scene Detection Using Transformer Encoding Linker Network (TELNet)

Shu-Ming Tseng; Zhi-Ting Yeh; Chia-Yang Wu; Jia-Bin Chang; Mehdi Norouzi

doi:10.3390/s23167050

Sensors (Aug 2023)

Video Scene Detection Using Transformer Encoding Linker Network (TELNet)

Shu-Ming Tseng,
Zhi-Ting Yeh,
Chia-Yang Wu,
Jia-Bin Chang,
Mehdi Norouzi

Affiliations

Shu-Ming Tseng: Department of Electronic Engineering, National Taipei University of Technology, Taipei 106335, Taiwan
Zhi-Ting Yeh: College of Engineering and Applied Science, University of Cincinnati, Cincinnati, OH 45219, USA
Chia-Yang Wu: Department of Electronic Engineering, National Taipei University of Technology, Taipei 106335, Taiwan
Jia-Bin Chang: College of Engineering and Applied Science, University of Cincinnati, Cincinnati, OH 45219, USA
Mehdi Norouzi: College of Engineering and Applied Science, University of Cincinnati, Cincinnati, OH 45219, USA

DOI: https://doi.org/10.3390/s23167050
Journal volume & issue: Vol. 23, no. 16
p. 7050

Abstract

Read online

This paper introduces a transformer encoding linker network (TELNet) for automatically identifying scene boundaries in videos without prior knowledge of their structure. Videos consist of sequences of semantically related shots or chapters, and recognizing scene boundaries is crucial for various video processing tasks, including video summarization. TELNet utilizes a rolling window to scan through video shots, encoding their features extracted from a fine-tuned 3D CNN model (transformer encoder). By establishing links between video shots based on these encoded features (linker), TELNet efficiently identifies scene boundaries where consecutive shots lack links. TELNet was trained on multiple video scene detection datasets and demonstrated results comparable to other state-of-the-art models in standard settings. Notably, in cross-dataset evaluations, TELNet demonstrated significantly improved results (F-score). Furthermore, TELNet’s computational complexity grows linearly with the number of shots, making it highly efficient in processing long videos.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords