Applied Sciences (May 2024)

FAPR: An Adaptive Approach to Link Failure Recovery in SDN with High Speed and Low Interruption Rate

  • Haijun Qin,
  • Jue Chen,
  • Xihe Qiu,
  • Xinyu Zhang,
  • Meng Cui

DOI
https://doi.org/10.3390/app14114719
Journal volume & issue
Vol. 14, no. 11
p. 4719

Abstract

Read online

Link failures are the most common type of fault in software-defined networking (SDN), which is an extremely crucial aspect of SDN fault tolerance. Existing strategies include proactive and reactive approaches. Proactive schemes pre-deploy backup paths for fast recovery but may exhaust resources, while reactive schemes calculate paths upon failure, resulting in longer recovery but better outcomes. This paper proposes a single link failure recovery strategy that combines these two schemes, termed as flow-aware pro-reactive (FAPR), with the aim of achieving high-speed recovery while ensuring high-quality backup paths. Specifically, the controller adopts pro-VLAN to install backup paths for each link into switches, and precalculates multiple backup paths for each link in the controller before any link failures. In case of a link failure, pro-VLAN, i.e., a method based on the proactive approach, is initially utilized for swift recovery automatically without the involvement of the controller. Simultaneously, the controller analyzes types of affected flows based on the transport layer data, obtains several key network indicators of the backup paths, and then selects the most suitable path for different flows on the basis of the current network view. Simulation results and theoretical analysis show that the recovery time of the FAPR scheme reduces by over 65% compared with the reactive scheme. The interruption rate of flows after fault recovery is reduced by 20% and 50% compared with the reactive and proactive schemes, respectively. In addition, due to the principle of pro-VLAN, the number of backup flow rules required is at most 85% less than that required by the proactive scheme. In conclusion, FAPR promises the highest failure recovery speed and the lowest interruption rate among three methods, and helps to improve the quality of network services.

Keywords