IEEE Access (Jan 2021)

Mitigating Virtualization Failures Through Migration to a Co-Located Hypervisor

  • Frederico Cerveira,
  • Raul Barbosa,
  • Henrique Madeira

DOI
https://doi.org/10.1109/ACCESS.2021.3098644
Journal volume & issue
Vol. 9
pp. 105255 – 105269

Abstract

Read online

Many organizations are moving their systems to the cloud, where providers consolidate multiple clients using virtualization, which creates challenges to business-critical applications. Research has shown that hypervisors fail, often causing common-mode failures that may abruptly disrupt dozens of virtual machines simultaneously. We hypothesize and empirically show that a significant percentage of virtual machines affected by a hypervisor failure are capable of continuing execution on a new hypervisor. Supported by this observation, we design a technique for recovering from hypervisor failures through efficient virtual machine migration to a co-located hypervisor, which allows virtual machines to continue executing with minimal downtime and which can be transparently applied to existing applications. We evaluate a proof-of-concept implementation using fault injection of hardware and software faults and show that it can recover, on average, 41-46% of all virtual machines, as well as having a mean virtual machine downtime of 3 seconds.

Keywords