Automated Event Detection and Classification in Soccer: The Potential of Using Multiple Modalities

Olav Andre Nergård Rongved; Markus Stige; Steven Alexander Hicks; Vajira Lasantha Thambawita; Cise Midoglu; Evi Zouganeli; Dag Johansen; Michael Alexander Riegler; Pål Halvorsen

doi:10.3390/make3040051

Machine Learning and Knowledge Extraction (Dec 2021)

Automated Event Detection and Classification in Soccer: The Potential of Using Multiple Modalities

Olav Andre Nergård Rongved,
Markus Stige,
Steven Alexander Hicks,
Vajira Lasantha Thambawita,
Cise Midoglu,
Evi Zouganeli,
Dag Johansen,
Michael Alexander Riegler,
Pål Halvorsen

Affiliations

Olav Andre Nergård Rongved: Department of Computer Science, Oslo Metropolitan University, 0167 Oslo, Norway
Markus Stige: Department of Informatics, University of Oslo, 0373 Oslo, Norway
Steven Alexander Hicks: Department of Computer Science, Oslo Metropolitan University, 0167 Oslo, Norway
Vajira Lasantha Thambawita: Department of Computer Science, Oslo Metropolitan University, 0167 Oslo, Norway
Cise Midoglu: SimulaMet, 0167 Oslo, Norway
Evi Zouganeli: Department of Computer Science, Oslo Metropolitan University, 0167 Oslo, Norway
Dag Johansen: Department of Computer Science, UIT The Arctic University of Norway, 9037 Tromsø, Norway
Michael Alexander Riegler: SimulaMet, 0167 Oslo, Norway
Pål Halvorsen: SimulaMet, 0167 Oslo, Norway

DOI: https://doi.org/10.3390/make3040051
Journal volume & issue: Vol. 3, no. 4
pp. 1030 – 1054

Abstract

Read online

Detecting events in videos is a complex task, and many different approaches, aimed at a large variety of use-cases, have been proposed in the literature. Most approaches, however, are unimodal and only consider the visual information in the videos. This paper presents and evaluates different approaches based on neural networks where we combine visual features with audio features to detect (spot) and classify events in soccer videos. We employ model fusion to combine different modalities such as video and audio, and test these combinations against different state-of-the-art models on the SoccerNet dataset. The results show that a multimodal approach is beneficial. We also analyze how the tolerance for delays in classification and spotting time, and the tolerance for prediction accuracy, influence the results. Our experiments show that using multiple modalities improves event detection performance for certain types of events.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords