IET Image Processing (Oct 2024)
A pyramid Gaussian pooling based CNN and transformer hybrid network for smoke segmentation
Abstract
Abstract Visual smoke semantic segmentation is a challenging task due to semi‐transparency, variable shapes, and complex textures of smoke. To improve segmentation performance, a convolutional neural network and transformer hybrid network are proposed based on pyramid Gaussian pooling (PGP) for smoke segmentation. In order to utilize low‐pass filtering to suppress noise, a PGP method is designed. Then, the output of PGP is reshaped to construct a set of visual tokens for transformers, thus a PGP‐transformer module is presented to make full use of the self‐attention mechanism. Finally, the PGP‐transformer module is inserted into the U‐shaped architecture with skip connections. A large number of experiments have proved that the method is significantly superior to existing state‐of‐the‐art algorithms on virtual and real smoke datasets, and ablation experiments have also verified the effectiveness of the proposed modules.
Keywords