IET Computer Vision (Apr 2018)

Using mel‐frequency audio features from footstep sound and spatial segmentation techniques to improve frame‐based moving object detection

  • Aditya Roshan,
  • Yun Zhang

DOI
https://doi.org/10.1049/iet-cvi.2017.0209
Journal volume & issue
Vol. 12, no. 3
pp. 341 – 349

Abstract

Read online

Moving object detection in video streams is a challenging and integral part of computer vision which is used in surveillance, traffic and site monitoring, and navigation. Compared with the background‐based techniques, frame differencing technique is computationally inexpensive. However, frame differencing technique only detects the boundary of a moving object. Due to changing light conditions, shadows, poor contrast between object and background, and a slow‐moving object, object detection rate from frame differencing technique reduces. This is because the number of noisy frames and frames with missing/partially detected object increases. Application of large kernel size morphological operations fails to remove noise as they might remove the boundary (or part) of a moving object. In this study, the authors propose a methodology to improve the frame differencing technique using footstep sound generated by a moving object. Audio recorded with the video system is processed and footstep sound is detected using audio features computed as mel‐frequency cepstral coefficients. Number of frames within each footstep sound are counted and processed. Spatial segmentation is used to find the moving object in noisy frames. A missing or partially detected object is recovered by modelling an ellipse using a moving object from other neighbourhood frames.

Keywords