IEEE Access (Jan 2025)
Deep Image Synthesis, Analysis and Indexing Using Integrated CNN Architectures
Abstract
The excessive use of Internet technology is leading to a massive increase in multimedia content. Fast and effective image retrieval over a wide range of databases is a difficult task in this modern research era. Various content-based image retrieval (CBIR) systems have been developed to store and retrieve related images to meet the needs of these systems. However, the existing systems lack high accuracy due to problems in foreground and background objects distinction and high semantic gap. The proposed model presents a three-phase approach including image analysis, synthesis, and indexing to improve image retrieval efficiency and accuracy by integrating deep features of CNN models. Initially, color images are converted to grayscale images and the analysis phase accelerates the feature extraction process by applying intensity functions, outer boundary detection, thresholding, connected component labeling, and intensity inversion techniques to efficiently process grayscale images. These features are further refined through synthesis phase containing comprehensive steps, such as the use of multi-scale detection, enumeration, local binarization, invariance and covariance computations to improve the precision of the extracted data. The deep features of CNN models such as VGG19, InceptionV3 and AlexNet are combined with hand-crafted feature vectors to overcome semantic gap and improve image content analysis. This fusion provides a significant increase in image retrieval precision. Finally, integrating bag-of-words (BOW) model in indexing phase significantly improves the accuracy of image retrieval. The model is evaluated on Cifar-10, Cifar-100, and Caltech-101 datasets. The results are evaluated in terms of precision, recall, average retrieval precision (ARP), average retrieval recall (ARR), mean average precision (MAP), and mean average recall (MAR). The proposed model achieves MAP of 95% using VGG19, and 92% for both AlexNet and InceptionV3 for Cifar-10 dataset. The results show that the proposed model achieves high precision compared to state-of-the-art methods presented in the literature.
Keywords