IEEE Access (Jan 2024)

Elephant Flow Detection With Random Forest Models Under Programmable Network Dataplane Constraints

  • Piotr Jurkiewicz,
  • Bartosz Kadziolka,
  • Miroslaw Kantor,
  • Robert Wojcik,
  • Jerzy Domzal

DOI
https://doi.org/10.1109/ACCESS.2024.3485588
Journal volume & issue
Vol. 12
pp. 158561 – 158578

Abstract

Read online

This paper investigates the application of tree-based machine learning classifiers for flow-based traffic engineering, focusing on the binary classification of IP network flows into mice (short flows) and elephants (long flows) using 5-tuple header fields from the first packet. Unlike prior studies on network flow classification, our analysis uses performance metrics normalized by traffic coverage, ensuring relevance for traffic engineering and QoS applications. We also evaluate models within the constraints of programmable switching hardware, such as the Intel Tofino P4 chip. Our findings show that such constrained models can achieve high accuracy while performing inference at line rate in the dataplane. Additionally, we reveal a trade-off between tree depth and input format, with bit transformations enabling more efficient feature extraction at lower depths. Our results show that optimal tree depths range from 15 to 25 levels, depending on the input format. The most effective model employs extremely randomized trees with bit-transformed input and trees of depth 20.

Keywords