Toward Accurate Quality Assessment of Machine-Generated Infrared Video Using Fr&#x00E9;chet Video Distance

Huaizheng Lu; Shiwei Wang; Dedong Zhang; Bin Huang; Erkang Chen; Yunfeng Sui

doi:10.1109/ACCESS.2024.3453406

IEEE Access (Jan 2024)

Toward Accurate Quality Assessment of Machine-Generated Infrared Video Using Fréchet Video Distance

Huaizheng Lu,
Shiwei Wang,
Dedong Zhang,
Bin Huang,
Erkang Chen,
Yunfeng Sui

Affiliations

Huaizheng Lu: ORCiD; Department of Computer Science and Technology, School of Computer Engineering, Jimei University, Xiamen, China
Shiwei Wang: Department of Computer Science and Technology, School of Computer Engineering, Jimei University, Xiamen, China
Dedong Zhang: Department of Computer Science and Technology, School of Computer Engineering, Jimei University, Xiamen, China
Bin Huang: ORCiD; Department of Computer Science and Technology, School of Computer Engineering, Jimei University, Xiamen, China
Erkang Chen: ORCiD; Department of Computer Science and Technology, School of Computer Engineering, Jimei University, Xiamen, China
Yunfeng Sui: Research Center, Second Research Institute of CAAC, Chengdu, China

DOI: https://doi.org/10.1109/ACCESS.2024.3453406
Journal volume & issue: Vol. 12
pp. 168837 – 168852

Abstract

Read online

Video generation methods have important implications for the fields of visual control and decision-making. Current research often uses the Fréchet Video Distance (FVD) as an evaluation metric for machine-generated video. However, FVD has not been thoroughly verified on non-visible light sources, especially the widely used infrared light. Therefore, there is an urgent need to use real infrared video data to test the reliability and generalization ability of FVD. Toward that goal, we first collected mainstream infrared video datasets and added various types of noise to synthesize infrared videos of different quality levels. Experiments based on synthetic dataset demonstrate the feasibility of using FVD to assess the quality of infrared video. Next, we trained the Pix2PixGAN network using a dataset containing aligned visible and infrared image pairs. The trained model can generate videos of different quality levels in the infrared light domain. With the generated infrared videos, our experiments show that FVD is able to distinguish the quality differences of different infrared videos. In particular, we found that the lack of labeled infrared dataset and relatively small dataset size of infrared videos has a negative impact on calculating credible FVD values. This is because extracting effective infrared video features remains a difficult problem. Our experimental results suggest that infrared video features can be extracted using large-scale visible light video pre-trained I3D models, and their calculated FVD values are even better than those directly using infrared video pre-trained I3D models. Our study provides a basis for using FVD to evaluate the quality of machine-generated videos under multispectral conditions.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords