IEEE Access (Jan 2024)

Early Traffic Classification With Encrypted ClientHello: A Multi-Country Study

  • Danil Shamsimukhametov,
  • Anton Kurapov,
  • Mikhail Liubogoshchev,
  • Evgeny Khorov

DOI
https://doi.org/10.1109/ACCESS.2024.3469730
Journal volume & issue
Vol. 12
pp. 142979 – 142993

Abstract

Read online

Quality of service provisioning in modern networks requires traffic to be classified as quickly as possible according to its requirements and service type. However, traffic classification (TC) becomes increasingly challenging as traffic encryption evolves. The Encrypted ClientHello (ECH) amendment to the most widespread encryption protocol, Transport Layer Security (TLS), conceals the most sensitive metadata of the TLS-encrypted flows including the Server Name Indication (SNI), which provides ground-truth early TC. Nevertheless, the backward compatibility and protocol limitations leave some non-random TLS metadata open. This paper designs a new early traffic classifier called hybrid Random Forest Traffic Classifier (hRFTC) that utilizes unencrypted TLS metadata together with the statistical features of the traffic flows extracted before the arrival of any application data from the server side. The paper collects an up-to-date diversified traffic dataset in various countries of North America, Europe, and Asia, which is available online and is one of the largest, most detailed, and diversified open-source TC datasets. The paper evaluates the performance of the state-of-the-art TC algorithms on the collected dataset. The results reveal that unencrypted in ECH scenario TLS settings are similar for many multimedia services. Consequently, the TC algorithms that rely solely on the TLS features achieve as low as 38.4% classification F-score. Meanwhile, the hybrid approach of the hRFTC dramatically enhances the TC efficacy. hRFTC achieves up to a 94.6% F-score on the collected dataset, which is superior to the best state-of-the-art algorithms.

Keywords