Video Description: Datasets &#x0026; Evaluation Metrics

Muhammad Rafiq; Ghazala Rafiq; Gyu Sang Choi

doi:10.1109/ACCESS.2021.3108565

IEEE Access (Jan 2021)

Video Description: Datasets & Evaluation Metrics

Muhammad Rafiq,
Ghazala Rafiq,
Gyu Sang Choi

Affiliations

Muhammad Rafiq: ORCiD; Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, South Korea
Ghazala Rafiq: ORCiD; Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, South Korea
Gyu Sang Choi: ORCiD; Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, South Korea

DOI: https://doi.org/10.1109/ACCESS.2021.3108565
Journal volume & issue: Vol. 9
pp. 121665 – 121685

Abstract

Read online

Rapid expansion and the novel phenomenon of deep learning have manifested a variety of proposals and concerns in the area of video description, particularly in the recent past. Automatic event localization and textual alternatives generation for the complex and diverse visual data supplied in a video can be articulated as video description, bridging the two leading realms of computer vision and natural language processing. Several sequence-to-sequence algorithms are being proposed by splitting the task into two segments, namely encoding, i.e., getting and learning the insights of the visual representations, and decoding, i.e., transforming the learned representations to a sequence of words, one at a time. Implemented deep learning approaches have gained a lot of recognition for the reason of their superior computing capabilities and tremendous performance. However, the accomplishment of these algorithms strongly depends on the nature, diversity, and amount of data they are trained, validated and tested on. Techniques applied on insufficient and inadequate train/test data cannot deliver promising conclusions, consequently making it complicated to evaluate the quality of generated results. This survey focuses explicitly on the benchmark datasets, and evaluation metrics developed and deployed for video description tasks and their capabilities and limitations. Finally, we concluded with the need for essential enhancements and encouraging research directions on the topic.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords