Intelligent Systems with Applications (Sep 2024)

Multi-scale transformer network for super-resolution of visible and thermal air images

  • Hèdi Fkih,
  • Abdelaziz Kallel,
  • Zied Chtourou

Journal volume & issue
Vol. 23
p. 200429

Abstract

Read online

Reference image-based Super-Resolution (RefSR) is introduced to improve the quality of a Low-resolution (LR) input image by leveraging the additional information provided by a High-Resolution (HR) reference image (Ref). While existing RefSR methods focus on thermal or visible flows separately, they often struggle to enhance the resolution of small objects such as Mini/Micro UAVs (Unmanned Aerial Vehicle) due to the resolution disparities between the input and reference images. To cope with these challenges when dealing with UAV early detection in context of video surveillance, we propose ThermoVisSR, a multiscale texture transformer for enhancing the Super-Resolution (SR) of visible and thermal images of Mini/Micro UAVs. Our approach tries to reconstruct the fine details of these objects while preserving their approximation (the body form and color of the different scene objects) already contained in the LR image. Hence, our model is divided up into two streams dealing separately with approximation and detail reconstruction. In the first one, we introduce a Convolution Neural Network (CNN) fusion backbone to extract the Low-Frequency (LF) approximation from the original LR image pairs. In the second one and to extract the details from the Ref image, our approach involves blending features from both visible and thermal sources to make the most of what each offer. Subsequently, we introduce the High-Frequency Texture Transformer (HFTT) across various resolutions of the merged features to ensure an accurate correspondence matching and significant transfer of High-Frequency (HF) patches from Ref to LR images. Moreover, to adapt the injection to the different bands well, we incorporate the separable software decoder (SSD) into the HFTT allowing to capture channel-specific details during the reconstruction phase. We validated our approach using a newly created dataset of Air images of Mini/Micro UAVs. Experimental results demonstrate that the proposed model consistently outperforms the state-of-the-art approaches on both qualitative and quantitative assessments.

Keywords