Electronic Research Archive (Jul 2024)
Fully convolutional video prediction network for complex scenarios
Abstract
Traditional predictive models, often used in simpler settings, face issues like high latency and computational demands, especially in complex real-world environments. Recent progress in deep learning has advanced spatiotemporal prediction research, yet challenges persist in general scenarios: (ⅰ) Latency and computational load of models; (ⅱ) dynamic nature of real-world environments; (ⅲ) complex motion and monitoring scenes. To overcome these challenges, we introduced a novel spatiotemporal prediction framework. It replaced high-latency recurrent models with fully convolutional ones, improving inference speed. Furthermore, it addressed the dynamic nature of environments with multilevel frequency domain encoders and decoders, facilitating spatial and temporal learning. For complex monitoring scenarios, a large receptive field token mixer spatial-frequency attention units (SAU) and time attention units (TAU) ensured temporal and spatial continuity. This framework outperformed current methods in accuracy and speed on public datasets, showing promising practical applications beyond electricity monitoring.
Keywords