IEEE Access (Jan 2024)
Detecting Anomalies in Attributed Networks Through Sparse Canonical Correlation Analysis Combined With Random Masking and Padding
Abstract
Attributed networks are prevalent in the current information infrastructure, where node attributes enhance knowledge discovery. Anomaly detection in attributed networks is gaining attention for its potential uses in cybersecurity, finance, and healthcare. Recognizing the complicated relationship between node attributes and network topology is crucial for attributed network embedding and anomaly detection. Nevertheless, there are few approaches available to directly represent the relationship between these two perspectives of the node property and the network topology. Approaches utilizing the reconstruction error rely on straightforward, simple mappings, which introduce a substantial risk of overfitting in high-dimensional data, wherein the model acquires patterns that are exclusive to the training data and fails to generalize to new data. To do this, we suggest a new way to find graph anomalies on attributed networks using random masking and padding along with sparse canonical correlation analysis. Motivated by the limitations of existing methodologies in effectively addressing these challenges, our research introduces a novel methodology for anomaly detection in attributed networks by leveraging Sparse Canonical Correlation Analysis (SCCA) in conjunction with Random Masking and Padding (RMP). This dual approach uniquely addresses the challenges of high-dimensional data and the sparsity of attributes, which are prevalent issues in anomaly detection. Unlike previous works that primarily focus on either dimensionality reduction or attribute sparsity independently, our method synergizes these aspects to enhance detection performance. Initially, we randomly mask and pad nodes in the attributed network and use the Graph Convolutional Network (GCN) to map them to latent space. Next, we optimize the distribution alignment of node attributes and graph structure latent space representations by using Kullback-Leibler (KL) divergence regularization, which increases their comparability. Finally, we use sparse canonical correlation analysis (SCCA) to quantify the correlation between node attributes and network structure views in latent space. SCCA incorporates sparsity by making the model choose fewer variables, which adds another level of complexity. It improves interpretability and reduces overfitting in high-dimensional data analysis by highlighting only the key variables. To optimize our model, we maximize the correlation between attribute and structural aspects of normal nodes, and anomalies are detected by measuring the correlation between these two views. Our approach is the first of its kind to provide a novel remedy to the fundamental problems preventing efficient and accurate anomaly identification, thereby establishing a new standard in this field. The proposed model has been extensively tested on four real-world datasets, and its effectiveness has been demonstrated in comparison to state-of-the-art approaches. The empirical evaluation across multiple benchmark datasets validates the potential of the proposed approach as a pivotal tool in advancing anomaly detection research and applications.
Keywords