IEEE Access (Jan 2024)
ZEN-IQA: Zero-Shot Explainable and No-Reference Image Quality Assessment With Vision Language Model
Abstract
No-reference image quality assessment (NR-IQA), which aims to estimate the perceptual quality of a degraded image without accessing the corresponding original image, is a key challenge in low-level computer vision. Recent advances in deep learning have enabled the development of high-performance NR-IQA methods. However, such methods are limited, as they are highly dependent on the training dataset. Recognizing this limitation and avoiding task-specific training, an alternative method has been proposed that employ pre-trained visual language models for zero-shot NR-IQA; however, this approach does not provide any basis for decision-making and is not explainable. In this study, we propose ZEN-IQA, a new zero-shot and explainable NR-IQA method. Utilizing a new approach involving carefully constructed prompt pairs and triplets makes the evaluation process more intuitive and easier to understand. Our comparative analysis reveals that ZEN-IQA not only has high interpretability, but also outperforms methods using handcrafted features and state-of-the-art deep learning methods trained based on datasets that differ from the test set. We also applied ZEN-IQA to images before and after image processing and conducted experiments to evaluate how perceptual quality changes with image processing. The code is publicly available at https://github.com/mtakamichi/ZEN-IQA.
Keywords