IEEE Access (Jan 2022)
Important Region Estimation Using Image Captioning
Abstract
When storing images and videos on a limited storage device or transmitting them over a narrow-band network, an effective approach is to detect the necessary parts and process them preferentially. Visual saliency has often been used for this purpose, and many methods have been proposed to detect salient objects. However, a salient object is not necessarily the primary subject in an image. Determining the important regions in an image is not clear or easy to achieve because it generally depends on the context of the image. In this study, we propose a novel framework for detecting important image regions. We leverage an image-captioning technique because it interprets the context of an image when generating sentences. The proposed method determines those important regions that are closer to the level of human sensitivity by exploiting semantic information from the image captioning. To evaluate the effectiveness of the proposed method, we created a dataset that defines important regions within images based on experiments using subjective evaluation. Applying this dataset, we confirmed that the accuracy of the proposed approach was higher than that of conventional saliency-based object detection methods.
Keywords