How Deep Neural Networks Understand Motion? Toward Interpretable Motion Modeling by Leveraging the Relative Change in Position

Hehe Fan; Tao Zhuo; Xiaoyu Feng; Guoshun Nan

doi:10.34133/icomputing.0008

Intelligent Computing (Jan 2023)

How Deep Neural Networks Understand Motion? Toward Interpretable Motion Modeling by Leveraging the Relative Change in Position

Hehe Fan,
Tao Zhuo,
Xiaoyu Feng,
Guoshun Nan

Affiliations

Hehe Fan: School of Computing, National University of Singapore, Singapore, Singapore.
Tao Zhuo: Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences), Jinan, China.
Xiaoyu Feng: Department of Electronic Engineering, Tsinghua University, Beijing, China.
Guoshun Nan: School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China.

DOI: https://doi.org/10.34133/icomputing.0008
Journal volume & issue: Vol. 2

Abstract

Read online

Motion understanding plays an important role in video-based cross-media analysis and multiple knowledge representation learning. This paper discusses physical motion recognition and prediction by deep neural networks (DNNs), such as convolutional neural networks and recurrent neural networks. In physics, motion is the relative change in position with respect to time. To ablate the moving object and the background where the motion happens, we focus on an ideal scenario where a point moves in a plane. As the first contribution, we evaluate a few popular DNN architectures from video research on the relative position change modeling. Experiment results and conclusions can be insightful in action recognition and video prediction. As the second contribution, we propose a vector network (VecNet) to model the relative change in position. VecNet considers the motion in a short interval as a vector. Meanwhile, VecNet can move a point to the corresponding position given a vector representation. To obtain the representation of the motion for a long time, we use a long short-term memory (LSTM) to aggregate or predict vector representations over time. The resulting VecNet+LSTM approach is able to effectively support both recognition and prediction, proving that modeling relative position change is necessary for motion recognition and makes motion prediction easier.

Published in Intelligent Computing

ISSN: 2771-5892 (Online)
Publisher: American Association for the Advancement of Science (AAAS)
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://spj.sciencemag.org/journals/icomputing/

About the journal