IEEE Access (Jan 2024)
AESHML: An Automatic Editing Method for Soccer Match Highlights Using Multimodal Learning
Abstract
With the increasing prevalence of short videos, viewers are increasingly seeking concise and engaging segments of video content. In soccer matches, despite the fact that the duration of the match often exceeds 90 minutes, it is the brief highlight moments that truly capture the viewers’ attention. However, traditional editing methods are often time-consuming and labor-intensive, making it challenging to meet the immediate demands of contemporary viewers. To address this challenge, this paper presents an innovative automatic editing method for soccer match highlights using multimodal learning (AESHML), which integrates both video and audio features, enabling the precise capture of highlight moments in soccer events through advanced Transformer and long short-term memory (LSTM) multimodal learning models. Utilizing the self-constructed soccer match highlight moments dataset (SHD-114), the AESHML method exhibits outstanding performance, achieving an accuracy of 83.95% and an F1 score of 82.71%. The implementation of the AESHML method not only significantly reduces editing time and labor costs but also enhances editing efficiency, allowing viewers to experience the match more quickly and appreciate its essence. Furthermore, it promotes the dissemination of soccer events, thereby enabling a broader audience to appreciate the allure of the sport.
Keywords