Dilated spatial–temporal convolutional auto-encoders for human fall detection in surveillance videos
Suyuan Li,
Xin Song,
Siyang Xu,
Haoyang Qi,
Yanbo Xue
Affiliations
Suyuan Li
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
Xin Song
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China; School of Computer and Communication Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066004, China; Corresponding author at: School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China.
Siyang Xu
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China; School of Computer and Communication Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066004, China
Haoyang Qi
School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China; School of Computer and Communication Engineering, Northeastern University at Qinhuangdao, Qinhuangdao 066004, China
Yanbo Xue
BOSS ZhiPin Career Science Lab (CSL), Beijing 100028, China
Although methods based on supervised learning have demonstrated remarkable performance on fall detection, these existing fall detection algorithms require a substantial quantity of manually labeled training data. In this paper, we combine dilated convolution and LSTM based on auto-encoder, which can be trained on unlabeled data, further saving time and resources, and a novel fall score is computed based on the high-quality reconstructed frame to detect falls. Extensive experimental results indicate that the proposed method further boosts the performance, achieving recognition rate of 97.1%, sensitivity rate of 93.9% and precision rate of 95.1% on the UR dataset.