IEEE Access (Jan 2024)

Toward Accurate Quality Assessment of Machine-Generated Infrared Video Using Fréchet Video Distance

  • Huaizheng Lu,
  • Shiwei Wang,
  • Dedong Zhang,
  • Bin Huang,
  • Erkang Chen,
  • Yunfeng Sui

DOI
https://doi.org/10.1109/ACCESS.2024.3453406
Journal volume & issue
Vol. 12
pp. 168837 – 168852

Abstract

Read online

Video generation methods have important implications for the fields of visual control and decision-making. Current research often uses the Fréchet Video Distance (FVD) as an evaluation metric for machine-generated video. However, FVD has not been thoroughly verified on non-visible light sources, especially the widely used infrared light. Therefore, there is an urgent need to use real infrared video data to test the reliability and generalization ability of FVD. Toward that goal, we first collected mainstream infrared video datasets and added various types of noise to synthesize infrared videos of different quality levels. Experiments based on synthetic dataset demonstrate the feasibility of using FVD to assess the quality of infrared video. Next, we trained the Pix2PixGAN network using a dataset containing aligned visible and infrared image pairs. The trained model can generate videos of different quality levels in the infrared light domain. With the generated infrared videos, our experiments show that FVD is able to distinguish the quality differences of different infrared videos. In particular, we found that the lack of labeled infrared dataset and relatively small dataset size of infrared videos has a negative impact on calculating credible FVD values. This is because extracting effective infrared video features remains a difficult problem. Our experimental results suggest that infrared video features can be extracted using large-scale visible light video pre-trained I3D models, and their calculated FVD values are even better than those directly using infrared video pre-trained I3D models. Our study provides a basis for using FVD to evaluate the quality of machine-generated videos under multispectral conditions.

Keywords