Jisuanji kexue yu tansuo (Jun 2020)
Message Clustering Method for Private Binary Protocol
Abstract
Message clustering is one of the main steps of protocol reverse engineering. For the private binary protocol packets, the current message clustering method has the problem of message vectorization feature redundancy, and the traditional clustering method has the problem that the cluster center and the number of clusters are difficult to determine. According to the idea of n-gram serialization, the sequence item-location matrix of the message is constructed, frequent items are mined, and the message feature vector is constructed, which effectively removes the sequence noise in the message vectorization. The contour coefficient is used to guide the split hierarchical clus-tering, which avoids the initial clustering number and clustering center selection, so as to realize the clustering of private binary protocol messages under unsupervised conditions. The testing is carried out on a data set of 7 types of messages with 4 protocals: AIS, DNS, ICMP and ARP. The t-SNE visual interface is used to observe the distribution of packets. The feature vectorization method has a good distribution and feature expression. Compared with the traditional clustering method, the split-level hierarchical clustering based on the contour coefficient has significant improvement in purity and F1 value.
Keywords