Sensors (Dec 2022)
CGUN-2A: Deep Graph Convolutional Network via Contrastive Learning for Large-Scale Zero-Shot Image Classification
Abstract
Taxonomy illustrates that natural creatures can be classified with a hierarchy. The connections between species are explicit and objective and can be organized into a knowledge graph (KG). It is a challenging task to mine features of known categories from KG and to reason on unknown categories. Graph Convolutional Network (GCN) has recently been viewed as a potential approach to zero-shot learning. GCN enables knowledge transfer by sharing the statistical strength of nodes in the graph. More layers of graph convolution are stacked in order to aggregate the hierarchical information in the KG. However, the Laplacian over-smoothing problem will be severe as the number of GCN layers deepens, which leads the features between nodes toward a tendency to be similar and degrade the performance of zero-shot image classification tasks. We consider two parts to mitigate the Laplacian over-smoothing problem, namely reducing the invalid node aggregation and improving the discriminability among nodes in the deep graph network. We propose a top-k graph pooling method based on the self-attention mechanism to control specific node aggregation, and we introduce a dual structural symmetric knowledge graph additionally to enhance the representation of nodes in the latent space. Finally, we apply these new concepts to the recently widely used contrastive learning framework and propose a novel Contrastive Graph U-Net with two Attention-based graph pooling (Att-gPool) layers, CGUN-2A, which explicitly alleviates the Laplacian over-smoothing problem. To evaluate the performance of the method on complex real-world scenes, we test it on the large-scale zero-shot image classification dataset. Extensive experiments show the positive effect of allowing nodes to perform specific aggregation, as well as homogeneous graph comparison, in our deep graph network. We show how it significantly boosts zero-shot image classification performance. The Hit@1 accuracy is 17.5% relatively higher than the baseline model on the ImageNet21K dataset.
Keywords