IEEE Access (Jan 2024)
Image Translation and Reconstruction Using a Single Dual Mode Lightweight Encoder
Abstract
The richness of textures and semantic information from RGB images can be supplemented in computer vision by the robustness of thermal images to light variations and weather artifacts. While many models rely on inputs from one sensor modality, image translation among modalities can be a solution. The existing works use large models that only work in one translation direction. This cause problems in limited computation applications, as well as a lack of flexibility to work interchangeably for different modalities. Three channel cameras extract visually rich features, but processing them on embedded platforms becomes a bottleneck. Furthermore, edge computing systems impose the burden of compressing data to be sent elsewhere. To address these issues, we propose a novel architecture with a single lightweight encoder capable of working in dual mode, encoding inputs from both grayscale an thermal images into very compact latent vectors. The encoding is then used for cross-modal image translation, grayscale image colorization and thermal image reconstruction, thus allowing 1) different downstream tasks on different modalities, 2) visually rich features from grayscale images and 3) data compression. Four different generators are employed and the training occurs in adversarial fashion with two discriminators. The loss function proposed contains not only adversarial terms but also reconstruction error terms. They induce consistency and contrast preservation across translation and reconstruction. The results backed by evaluation over multiple metrics demonstrate that the model performs the tasks with competitive quality of translation/reconstruction of images with different lighting conditions. Finally, we perform ablation studies to demonstrate the effectiveness of loss terms combined.
Keywords