Semi-Supervised Temporal Segmentation of Manufacturing Work Video by Automatically Building a Hierarchical Tree of Category Labels

Kazuaki Nakamura; Naoko Nitta; Noboru Babaguchi; Kensuke Fujii; Satoki Matsumura; Eiji Nabata

doi:10.1109/ACCESS.2021.3076849

IEEE Access (Jan 2021)

Semi-Supervised Temporal Segmentation of Manufacturing Work Video by Automatically Building a Hierarchical Tree of Category Labels

Kazuaki Nakamura,
Naoko Nitta,
Noboru Babaguchi,
Kensuke Fujii,
Satoki Matsumura,
Eiji Nabata

Affiliations

Kazuaki Nakamura: ORCiD; Graduate School of Engineering, Osaka University, Osaka, Japan
Naoko Nitta: Graduate School of Engineering, Osaka University, Osaka, Japan
Noboru Babaguchi: Graduate School of Engineering, Osaka University, Osaka, Japan
Kensuke Fujii: Manufacturing Engineering Development Center, Komatsu Ltd., Osaka, Japan
Satoki Matsumura: Manufacturing Engineering Development Center, Komatsu Ltd., Osaka, Japan
Eiji Nabata: Manufacturing Engineering Development Center, Komatsu Ltd., Osaka, Japan

DOI: https://doi.org/10.1109/ACCESS.2021.3076849
Journal volume & issue: Vol. 9
pp. 68017 – 68027

Abstract

Read online

Nowadays, many industrial companies visually record workers’ activities for the purposes of streamlining their work processes. However, since untrimmed raw videos are hard to use, it is desired to automatically divide the videos into segments and recognize which kind of operation is performed on each segment. This task is called temporal video segmentation. We propose a method for achieving it, particularly targeting videos of manufacturing work with a specialized vehicle such as a hydraulic excavator. To make the performance of temporal video segmentation high, it is quite essential to extract good visual features from input videos. This can be hardly achieved by unsupervised methods, whereas supervised methods have another drawback that collecting a sufficient amount of training data is difficult due to its labor-intensiveness. To overcome these drawbacks, the proposed method employs a semi-supervised approach. We assume that a set of weakly-labeled videos whose frames only sparsely have a category label are given as input, where the labeled frames are used as training data to train a desirable feature extractor. Under this assumption, the proposed method first divides the input videos into short segments called primitive segments having the fixed length and then clusters them using visual features extracted by the above feature extractor. To achieve higher performance, we also use a hierarchical tree of the category labels and recursively perform the above process at each branch in the tree, where the tree is automatically built by the proposed method. In our experiments, we achieved a segmentation performance of 0.947 on the F-measure, even when only 1.25% of all the frames in the input videos are labeled.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords