IEEE Access (Jan 2025)

Hybrid-Hierarchical Synchronization for Resilient Large-Scale SDN Architectures

  • Alessandro Pacini,
  • Davide Scano,
  • Andrea Sgambelluri,
  • Luca Valcarenghi,
  • Alessio Giorgetti

DOI
https://doi.org/10.1109/ACCESS.2025.3527224
Journal volume & issue
Vol. 13
pp. 9032 – 9046

Abstract

Read online

Interest in hierarchical Software-Defined Networking (SDN) controllers is growing recently due to their ability to address the challenges associated with the SDN paradigm, such as responsiveness and scalability. This design enables efficient domain control separation, which uses different child instances to manage large-scale networks. Parent controller computational resources can be dedicated to cross-domain decision making, exploiting network views provided by its children. In this context, the correctness of the process fully relies on the network view synchronization mechanism, which should be fast and resilient. This paper presents a hybrid synchronization model combining a hierarchical design with established resilient cluster mechanisms. In this way, high-level control over large-scale networks can be guaranteed even with failures affecting every level of the management plane. Specifically, two applications are developed for the ONOS controller to share topology events using low-latency channels from child clusters to parent clusters. The performance of both applications is measured under different cluster configurations, topology sizes and number of generated topology updates. The results show that the proposed approach offers high performance while being fully compliant with the platform for which it is designed. This makes the solution easily extendable to heterogeneous child controllers. In fact, events are propagated from children to parents using gRPC, achieving end-to-end latency of less than 10ms under normal conditions and 40-60ms under high-rate event conditions. Consistency of network views is also guaranteed by strong event ordering and delivery mechanisms. Failures are handled seamlessly at both cluster levels (i.e. parent and child controllers) with a maximum synchronization delay of 2 seconds, which is quickly recovered.

Keywords