LatentColorization: Latent Diffusion-Based Speaker Video Colorization

Rory Ward; Dan Bigioi; Shubhajit Basak; John G. Breslin; Peter Corcoran

doi:10.1109/ACCESS.2024.3406249

IEEE Access (Jan 2024)

LatentColorization: Latent Diffusion-Based Speaker Video Colorization

Rory Ward,
Dan Bigioi,
Shubhajit Basak,
John G. Breslin,
Peter Corcoran

Affiliations

Rory Ward: ORCiD; SFI Centre for Research Training in Artificial Intelligence, Data Science Institute, University of Galway, Galway, Ireland
Dan Bigioi: ORCiD; School of Engineering, University of Galway, Galway, Ireland
Shubhajit Basak: ORCiD; School of Computer Science, University of Galway, Galway, Ireland
John G. Breslin: ORCiD; SFI Centre for Research Training in Artificial Intelligence, Data Science Institute, University of Galway, Galway, Ireland
Peter Corcoran: ORCiD; School of Engineering, University of Galway, Galway, Ireland

DOI: https://doi.org/10.1109/ACCESS.2024.3406249
Journal volume & issue: Vol. 12
pp. 81105 – 81121

Abstract

Read online

While current research predominantly focuses on image-based colorization, the domain of video-based colorization remains relatively unexplored. Many existing video colorization techniques operate frame-by-frame, often overlooking the critical aspect of temporal coherence between successive frames. This approach can result in inconsistencies across frames, leading to undesirable effects like flickering or abrupt color transitions between frames. To address these challenges, we combine the generative capabilities of a fine-tuned latent diffusion model with an autoregressive conditioning mechanism to ensure temporal consistency in automatic speaker video colorization. We demonstrate strong improvements on established quality metrics compared to existing methods, namely, PSNR, SSIM, FID, FVD, NIQE and BRISQUE. Specifically, we achieve an 18% improvement in performance when FVD is employed as the evaluation metric. Furthermore, we performed a subjective study, where users preferred LatentColorization to the existing state-of-the-art DeOldify 80% of the time. Our dataset combines conventional datasets and videos from television/movies. A short demonstration of our results can be seen in some example videos available at https://youtu.be/vDbzsZdFuxM.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords