Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model

SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu

doi:10.11896/jsjkx.210400272

Jisuanji kexue (Jun 2022)

Aerial Violence Recognition Based on Spatial-Temporal Graph Convolutional Networks and Attention Model

SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu

Affiliations

SHAO Yan-hua, LI Wen-feng, ZHANG Xiao-qiang, CHU Hong-yu, RAO Yun-bo, CHEN Lu: 1 School of Information,Southwest University of Science and Technology,Mianyang,Sichuan 621000,China ;2 School of Information and Software Engineering,University of Electronic Science & Technology,Chengdu 610054,China

DOI: https://doi.org/10.11896/jsjkx.210400272
Journal volume & issue: Vol. 49, no. 6
pp. 254 – 261

Abstract

Read online

The violence in public areas occurs frequently and video surveillance is of great significance for maintaining public safety.Compared with fixed cameras,unmanned aerial vehicles (UAVs) have surveillance mobility.However,in aerial images,the rapid movement of UAVs as well as the change of posture and height cause the problem of motion blur and large-scale change of target.To solve this problem,an attention spatial-temporal convolutional network (AST-GCN) combining attention mechanism is designed to realize the identification of violent behavior in aerial video.The proposed method is divided into two steps:the key frame detection network completes the initial positioning,and the AST-GCN network completes the behavior identification through the sequence features.Firstly,aiming at video violence localization,a key frame cascade detection network is designed to realize violence key frame detection based on human posture estimation,and preliminarily judge the occurrence time of violence.Secondly,the skeleton information of multiple frames around key frames is extracted from the video sequence,and the skeleton data is pre-processed,including normalization,screening and completion,so as to improve the robustness of different scenes and the partial missing of key nodes.And the skeleton temporal-spatial representation matrix is constructed according to the extracted skeleton information.Finally,AST-GCN network analyzes and identifies multiple frames of human skeleton information,to integrate attention module,improve feature expression ability,and complete the recognition of violent behavior.The method is validated on self-built aerial violence data set,and experimental results show that the AST-GCN can realize the recognition of aerial scene violence,and the recognition accuracy is 86.6%.The proposed method has important engineering value and scientific signifi-cance for the realization of aerial video surveillance and human pose understanding applications.

violence recognition|human pose estimation|aerial photography|spatial-temporal graph convolutional|cascade network|attention mechanism

Published in Jisuanji kexue

ISSN: 1002-137X (Print)
Publisher: Editorial office of Computer Science
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software; Technology: Technology (General)
Website: http://www.jsjkx.com/CN/1002-137X/home.shtml

About the journal

Abstract

Keywords