Journal of King Saud University: Computer and Information Sciences (Aug 2025)

DBF-PSR: a dual-branch fusion approach to network traffic classification using protocol semantic representation

  • Yaojun Ding,
  • Wei Chen

DOI
https://doi.org/10.1007/s44443-025-00233-w
Journal volume & issue
Vol. 37, no. 7
pp. 1 – 22

Abstract

Read online

Abstract With the widespread adoption of encrypted traffic and the scarcity of annotated data in specific network traffic domains, traditional network traffic classification methods face significant challenges in achieving effective performance. In this context, the “self-supervised learning + supervised learning” paradigm has emerged as a promising approach to enhance classification accuracy. However, existing methods often treat network traffic as opaque data, relying on models to autonomously learn semantic representations from raw inputs, while neglecting the rich prior knowledge embedded in network protocols. To address this limitation, we propose a novel network traffic classification model–DBF-PSR. DBF-PSR introduces a specialized tokenization strategy to enhance the semantic clarity of traffic data. During the self-supervised learning phase, it learns generalizable packet-level representations of protocol fields. In the domain-specific supervised learning stage, DBF-PSR utilizes the packet-level embeddings generated during previous stage and incorporates a dual-branch complementary mechanism to capture session-level representations. This hierarchical feature learning framework significantly boosts classification performance. We evaluate DBF-PSR on three representative traffic analysis tasks. Experimental results show that DBF-PSR outperforms state-of-the-art methods, achieving Macro F1-scores improvements of 1.29% to 11.7%, demonstrating strong adaptability to complex traffic patterns. Furthermore, under few-shot scenarios (with 10%, 20%, and 50% of training data), DBF-PSR improves Macro F1-scores by 2.35% to 18.93% over baseline methods, indicating its robustness and high accuracy in data-scarce environments.

Keywords