Digital Communications and Networks (Oct 2023)

A malware propagation prediction model based on representation learning and graph convolutional networks

  • Tun Li,
  • Yanbing Liu,
  • Qilie Liu,
  • Wei Xu,
  • Yunpeng Xiao,
  • Hong Liu

Journal volume & issue
Vol. 9, no. 5
pp. 1090 – 1100

Abstract

Read online

The traditional malware research is mainly based on its recognition and detection as a breakthrough point, without focusing on its propagation trends or predicting the subsequently infected nodes. The complexity of network structure, diversity of network nodes, and sparsity of data all pose difficulties in predicting propagation. This paper proposes a malware propagation prediction model based on representation learning and Graph Convolutional Networks(GCN) to address the aforementioned problems. First, to solve the problem of the inaccuracy of infection intensity calculation caused by the sparsity of node interaction behavior data in the malware propagation network, a mechanism based on a tensor to mine the infection intensity among nodes is proposed to retain the network structure information. The influence of the relationship between nodes on the infection intensity is also analyzed. Second, given the diversity and complexity of the content and structure of infected and normal nodes in the network, considering the advantages of representation learning in data feature extraction, the corresponding representation learning method is adopted for the characteristics of infection intensity among nodes. This can efficiently calculate the relationship between entities and relationships in low dimensional space to achieve the goal of low dimensional, dense, and real-valued representation learning for the characteristics of propagation spatial data. We also design a new method, Tensor2vec, to learn the potential structural features of malware propagation. Finally, considering the convolution ability of GCN for non-Euclidean data, we propose a dynamic prediction model of malware propagation based on representation learning and GCN to solve the time effectiveness problem of the malware propagation carrier. The experimental results show that the proposed model can effectively predict the behaviors of the nodes in the network and discover the influence of different characteristics of nodes on the malware propagation situation.

Keywords