Multitask Learning-Based Affective Prediction for Videos of Films and TV Scenes

Zhibin Su; Shige Lin; Luyue Zhang; Yiming Feng; Wei Jiang

doi:10.3390/app14114391

Applied Sciences (May 2024)

Multitask Learning-Based Affective Prediction for Videos of Films and TV Scenes

Zhibin Su,
Shige Lin,
Luyue Zhang,
Yiming Feng,
Wei Jiang

Affiliations

Zhibin Su: State Key Laboratory of Media Convergence and Communication, Beijing 100024, China
Shige Lin: School of Information and Communication Engineering, Communication University of China, Beijing 100024, China
Luyue Zhang: Key Laboratory of Acoustic Visual Technology and Intelligent Control System, Ministry of Culture and Tourism, Beijing 100024, China
Yiming Feng: Key Laboratory of Acoustic Visual Technology and Intelligent Control System, Ministry of Culture and Tourism, Beijing 100024, China
Wei Jiang: State Key Laboratory of Media Convergence and Communication, Beijing 100024, China

DOI: https://doi.org/10.3390/app14114391
Journal volume & issue: Vol. 14, no. 11
p. 4391

Abstract

Read online

Film and TV video scenes contain rich art and design elements such as light and shadow, color, composition, and complex affects. To recognize the fine-grained affects of the art carrier, this paper proposes a multitask affective value prediction model based on an attention mechanism. After comparing the characteristics of different models, a multitask prediction framework based on the improved progressive layered extraction (PLE) architecture (multi-headed attention and factor correlation-based PLE), incorporating a multi-headed self-attention mechanism and correlation analysis of affective factors, is constructed. Both the dynamic and static features of a video are chosen as fusion input, while the regression of fine-grained affects and classification of whether a character exists in a video are designed as different training tasks. Considering the correlation between different affects, we propose a loss function based on association constraints, which effectively solves the problem of training balance within tasks. Experimental results on a self-built video dataset show that the algorithm can give full play to the complementary advantages of different features and improve the accuracy of prediction, which is more suitable for fine-grained affect mining of film and TV scenes.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords