Image Translation and Reconstruction Using a Single Dual Mode Lightweight Encoder

Jose Amendola; Linga Reddy Cenkeramaddi; Ajit Jha

doi:10.1109/ACCESS.2024.3365831

IEEE Access (Jan 2024)

Image Translation and Reconstruction Using a Single Dual Mode Lightweight Encoder

Jose Amendola,
Linga Reddy Cenkeramaddi,
Ajit Jha

Affiliations

Jose Amendola: ORCiD; Department of Engineering Sciences, University of Agder, Grimstad, Norway
Linga Reddy Cenkeramaddi: ORCiD; Department of Information and Communication Technology, University of Agder, Grimstad, Norway
Ajit Jha: ORCiD; Department of Engineering Sciences, University of Agder, Grimstad, Norway

DOI: https://doi.org/10.1109/ACCESS.2024.3365831
Journal volume & issue: Vol. 12
pp. 26787 – 26799

Abstract

Read online

The richness of textures and semantic information from RGB images can be supplemented in computer vision by the robustness of thermal images to light variations and weather artifacts. While many models rely on inputs from one sensor modality, image translation among modalities can be a solution. The existing works use large models that only work in one translation direction. This cause problems in limited computation applications, as well as a lack of flexibility to work interchangeably for different modalities. Three channel cameras extract visually rich features, but processing them on embedded platforms becomes a bottleneck. Furthermore, edge computing systems impose the burden of compressing data to be sent elsewhere. To address these issues, we propose a novel architecture with a single lightweight encoder capable of working in dual mode, encoding inputs from both grayscale an thermal images into very compact latent vectors. The encoding is then used for cross-modal image translation, grayscale image colorization and thermal image reconstruction, thus allowing 1) different downstream tasks on different modalities, 2) visually rich features from grayscale images and 3) data compression. Four different generators are employed and the training occurs in adversarial fashion with two discriminators. The loss function proposed contains not only adversarial terms but also reconstruction error terms. They induce consistency and contrast preservation across translation and reconstruction. The results backed by evaluation over multiple metrics demonstrate that the model performs the tasks with competitive quality of translation/reconstruction of images with different lighting conditions. Finally, we perform ablation studies to demonstrate the effectiveness of loss terms combined.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords