Compressed video ensemble based pseudo-labeling for semi-supervised action recognition

Hayato Terao; Wataru Noguchi; Hiroyuki Iizuka; Masahito Yamamoto

Machine Learning with Applications (Sep 2022)

Compressed video ensemble based pseudo-labeling for semi-supervised action recognition

Hayato Terao,
Wataru Noguchi,
Hiroyuki Iizuka,
Masahito Yamamoto

Affiliations

Hayato Terao: Graduate School of Information Science and Technology, Hokkaido University, Sapporo, 0600814, Hokkaido, Japan; Corresponding author.
Wataru Noguchi: Faculty of Information Science and Technology, Hokkaido University, Sapporo, 0600814, Hokkaido, Japan
Hiroyuki Iizuka: Faculty of Information Science and Technology, Hokkaido University, Sapporo, 0600814, Hokkaido, Japan; Center for Human Nature, Artificial Intelligence, and Neuroscience (CHAIN), Hokkaido University, Sapporo, 0600812, Hokkaido, Japan
Masahito Yamamoto: Faculty of Information Science and Technology, Hokkaido University, Sapporo, 0600814, Hokkaido, Japan; Center for Human Nature, Artificial Intelligence, and Neuroscience (CHAIN), Hokkaido University, Sapporo, 0600812, Hokkaido, Japan

Journal volume & issue: Vol. 9
p. 100336

Abstract

Read online

Some recent studies have focused on deep learning based semi-supervised learning for action recognition. However, it is difficult to scale up their training because their input is RGB frames, the obtainment of which incurs computational and storage costs. In this paper, we propose a semi-supervised action recognition method that makes it easy to scale up the training by using features stored in compressed videos. Our method directly extracts multiple types of input features from compressed videos without any decoding and generates artificial labels of unlabeled videos through the ensembling of the predictions from these features. In addition to the standard supervised training on labeled videos, our models are trained to predict the artificial labels from strongly augmented features in unlabeled compressed videos. We show that our method is more efficient and achieves a better classification performance on some widely used datasets than conventional semi-supervised learning methods applying RGB frames.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords