IEEE Access (Jan 2024)
Tempus: Probabilistic Network Latency Verification
Abstract
Networks have exceedingly low latency requirements. Verifying network latency is crucial for identifying any bottlenecks that may negatively impact user experience and swiftness of business operations. Network operators today heavily rely on high-fidelity simulators to validate latency requirements. Alas, detailed simulators are slow and subsequently not scalable. Alternatively, network verifiers are emerging as powerful validation means. Network verifiers provide an abstract model of the network behavior. Albeit faster than their current simulation-based counterparts, abstracting the details of networks comes at a cost: the state-of-the-art verifiers have major limitations such as not modeling failures or latency that prevent them from reliably verifying latency. This paper bridges this gap by proposing a scalable latency verification method, Tempus, that decomposes latency verification into two phases (functional and temporal verification) and refines advanced abstract network models to enable fast temporal verification. Concretely, given a source and destination pair and the empirical latency measurements of network components (e.g., the queueing delay), Tempus returns the probability of reaching the destination from the source within a time frame under all failure scenarios. We evaluate Tempus under both wide area and datacenter networks and show that it is fast and scalable. For instance, Parsimon, a state-of-the-art fast network simulator, requires more than one month to simulate all failure scenarios of an 8-ary fat-tree network with 100 Gbps links under 25% load. Tempus, in contrast, verifies the latency of the same network among all $(source, destination)$ pairs and under all failure scenarios in only 8 minutes and 32 seconds, a speedup of three orders of magnitude. We also demonstrate that Tempus accurately approximates network latency under various degrees of load.
Keywords