COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Athanasia Zlatintsi; Petros Koutras; Georgios Evangelopoulos; Nikolaos Malandrakis; Niki Efthymiou; Katerina Pastra; Alexandros Potamianos; Petros Maragos

doi:10.1186/s13640-017-0194-1

EURASIP Journal on Image and Video Processing (Aug 2017)

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Athanasia Zlatintsi,
Petros Koutras,
Georgios Evangelopoulos,
Nikolaos Malandrakis,
Niki Efthymiou,
Katerina Pastra,
Alexandros Potamianos,
Petros Maragos

Affiliations

Athanasia Zlatintsi: School of Electr.& Comp. Enginr., National Technical University of Athens
Petros Koutras: School of Electr.& Comp. Enginr., National Technical University of Athens
Georgios Evangelopoulos: McGovern Institute for Brain Research at MIT MIT
Nikolaos Malandrakis: Signal Analysis and Interpretation Laboratory (SAIL), USC
Niki Efthymiou: School of Electr.& Comp. Enginr., National Technical University of Athens
Katerina Pastra: Cognitive Systems Research Institute
Alexandros Potamianos: School of Electr.& Comp. Enginr., National Technical University of Athens
Petros Maragos: School of Electr.& Comp. Enginr., National Technical University of Athens

DOI: https://doi.org/10.1186/s13640-017-0194-1
Journal volume & issue: Vol. 2017, no. 1
pp. 1 – 24

Abstract

Read online

Abstract Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. In order to enable comparisons with other computational models, we propose state-of-the-art algorithms, specifically a unified energy-based audio-visual framework and a method for text saliency computation, for the detection of perceptually salient events from videos. Additionally, a movie summarization system for the automatic production of summaries is presented. Two kinds of evaluation were performed, an objective based on the saliency annotation of the database and an extensive qualitative human evaluation of the automatically produced summaries, where we investigated what composes high-quality movie summaries, where both methods verified the appropriateness of the proposed methods. The annotation of the database and the code for the summarization system can be found at http://cognimuse.cs.ntua.gr/database .

Published in EURASIP Journal on Image and Video Processing

ISSN: 1687-5176 (Print); 1687-5281 (Online)
Publisher: SpringerOpen
Country of publisher: United Kingdom
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics
Website: https://jivp-eurasipjournals.springeropen.com

About the journal

Abstract

Keywords