Depth Video-Based Secondary Action Recognition in Vehicles via Convolutional Neural Network and Bidirectional Long Short-Term Memory with Spatial Enhanced Attention Mechanism

Weirong Shao; Mondher Bouazizi; Ohtuski Tomoaki

doi:10.3390/s24206604

Sensors (Oct 2024)

Depth Video-Based Secondary Action Recognition in Vehicles via Convolutional Neural Network and Bidirectional Long Short-Term Memory with Spatial Enhanced Attention Mechanism

Weirong Shao,
Mondher Bouazizi,
Ohtuski Tomoaki

Affiliations

Weirong Shao: Graduate School of Science and Technology, Keio University, Yokohama 223-8522, Japan
Mondher Bouazizi: Faculty of Science and Technology, Keio University, Yokohama 223-8522, Japan
Ohtuski Tomoaki: Faculty of Science and Technology, Keio University, Yokohama 223-8522, Japan

DOI: https://doi.org/10.3390/s24206604
Journal volume & issue: Vol. 24, no. 20
p. 6604

Abstract

Read online

Secondary actions in vehicles are activities that drivers engage in while driving that are not directly related to the primary task of operating the vehicle. Secondary Action Recognition (SAR) in drivers is vital for enhancing road safety and minimizing accidents related to distracted driving. It also plays an important part in modern car driving systems such as Advanced Driving Assistance Systems (ADASs), as it helps identify distractions and predict the driver’s intent. Traditional methods of action recognition in vehicles mostly rely on RGB videos, which can be significantly impacted by external conditions such as low light levels. In this research, we introduce a novel method for SAR. Our approach utilizes depth-video data obtained from a depth sensor located in a vehicle. Our methodology leverages the Convolutional Neural Network (CNN), which is enhanced by the Spatial Enhanced Attention Mechanism (SEAM) and combined with Bidirectional Long Short-Term Memory (Bi-LSTM) networks. This method significantly enhances action recognition ability in depth videos by improving both the spatial and temporal aspects. We conduct experiments using K-fold cross validation, and the experimental results show that on the public benchmark dataset Drive&Act, our proposed method shows significant improvement in SAR compared to the state-of-the-art methods, reaching an accuracy of about 84% in SAR in depth videos.

Published in Sensors

ISSN: 1424-8220 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/sensors

About the journal

Abstract

Keywords