Multi-scale transformer network for super-resolution of visible and thermal air images

Hèdi Fkih; Abdelaziz Kallel; Zied Chtourou

Intelligent Systems with Applications (Sep 2024)

Multi-scale transformer network for super-resolution of visible and thermal air images

Hèdi Fkih,
Abdelaziz Kallel,
Zied Chtourou

Affiliations

Hèdi Fkih: ip-label, Tunis, Tunisia; Signals systeMs aRtificial intelligence and neTworkS Laboratory, Sfax, Tunisia; Corresponding author at: Digital Research Center of Sfax, Sfax 3021, Tunisia.
Abdelaziz Kallel: Digital Research Center of Sfax, Sfax 3021, Tunisia; Signals systeMs aRtificial intelligence and neTworkS Laboratory, Sfax, Tunisia
Zied Chtourou: School of aeronautical specialties, Sfax, Tunisia

Journal volume & issue: Vol. 23
p. 200429

Abstract

Read online

Reference image-based Super-Resolution (RefSR) is introduced to improve the quality of a Low-resolution (LR) input image by leveraging the additional information provided by a High-Resolution (HR) reference image (Ref). While existing RefSR methods focus on thermal or visible flows separately, they often struggle to enhance the resolution of small objects such as Mini/Micro UAVs (Unmanned Aerial Vehicle) due to the resolution disparities between the input and reference images. To cope with these challenges when dealing with UAV early detection in context of video surveillance, we propose ThermoVisSR, a multiscale texture transformer for enhancing the Super-Resolution (SR) of visible and thermal images of Mini/Micro UAVs. Our approach tries to reconstruct the fine details of these objects while preserving their approximation (the body form and color of the different scene objects) already contained in the LR image. Hence, our model is divided up into two streams dealing separately with approximation and detail reconstruction. In the first one, we introduce a Convolution Neural Network (CNN) fusion backbone to extract the Low-Frequency (LF) approximation from the original LR image pairs. In the second one and to extract the details from the Ref image, our approach involves blending features from both visible and thermal sources to make the most of what each offer. Subsequently, we introduce the High-Frequency Texture Transformer (HFTT) across various resolutions of the merged features to ensure an accurate correspondence matching and significant transfer of High-Frequency (HF) patches from Ref to LR images. Moreover, to adapt the injection to the different bands well, we incorporate the separable software decoder (SSD) into the HFTT allowing to capture channel-specific details during the reconstruction phase. We validated our approach using a newly created dataset of Air images of Mini/Micro UAVs. Experimental results demonstrate that the proposed model consistently outperforms the state-of-the-art approaches on both qualitative and quantitative assessments.

Published in Intelligent Systems with Applications

ISSN: 2667-3053 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/intelligent-systems-with-applications

About the journal

Abstract

Keywords