Graph neural networks and cross-protocol analysis for detecting malicious IP addresses

Yonghong Huang; Joanna Negrete; John Wagener; Celeste Fralick; Armando Rodriguez; Eric Peterson; Adam Wosotowsky

doi:10.1007/s40747-022-00838-y

Complex & Intelligent Systems (Sep 2022)

Graph neural networks and cross-protocol analysis for detecting malicious IP addresses

Yonghong Huang,
Joanna Negrete,
John Wagener,
Celeste Fralick,
Armando Rodriguez,
Eric Peterson,
Adam Wosotowsky

Affiliations

Yonghong Huang
Joanna Negrete
John Wagener
Celeste Fralick
Armando Rodriguez
Eric Peterson
Adam Wosotowsky

DOI: https://doi.org/10.1007/s40747-022-00838-y
Journal volume & issue: Vol. 9, no. 4
pp. 3857 – 3869

Abstract

Read online

Abstract An internet protocol (IP) address is the foundation of the Internet, allowing connectivity between people, servers, Internet of Things, and services across the globe. Knowing what is connecting to what and where connections are initiated is crucial to accurately assess a company’s or individual’s security posture. IP reputation assessment can be quite complex because of the numerous services that may be hosted on that IP address. For example, an IP might be serving millions of websites from millions of different companies like web hosting companies often do, or it could be a large email system sending and receiving emails for millions of independent entities. The heterogeneous nature of an IP address typically makes it challenging to interpret the security risk. To make matters worse, adversaries understand this complexity and leverage the ambiguous nature of the IP reputation to exploit further unsuspecting Internet users or devices connected to the Internet. In addition, traditional techniques like dirty-listing cannot react quickly enough to changes in the security climate, nor can they scale large enough to detect new exploits that may be created and disappear in minutes. In this paper, we introduce the use of cross-protocol analysis and graph neural networks (GNNs) in semi-supervised learning to address the speed and scalability of assessing IP reputation. In the cross-protocol supervised approach, we combine features from the web, email, and domain name system (DNS) protocols to identify ones which are the most useful in discriminating suspicious and benign IPs. In our second experiment, we leverage the most discriminant features and incorporate them into the graph as nodes’ features. We use GNNs to pass messages from node to node, propagating the signal to the neighbors while also gaining the benefit of having the originating nodes being influenced by neighboring nodes. Thanks to the relational graph structure we can use only a small portion of labeled data and train the algorithm in a semi-supervised approach. Our dataset represents real-world data that is sparse and only contain a small percentage of IPs with verified clean or suspicious labels but are connected. The experimental results demonstrate that the system can achieve $$85.28\%$$ 85.28 % accuracy in detecting malicious IP addresses at scale with only $$5\%$$ 5 % of labeled data.

Published in Complex & Intelligent Systems

ISSN: 2199-4536 (Print); 2198-6053 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science; Technology: Technology (General): Industrial engineering. Management engineering: Information technology
Website: https://www.springer.com/journal/40747

About the journal

Abstract

Keywords