Visual Informatics (Mar 2018)
A visual analytics system for optimizing the performance of large-scale networks in supercomputing systems
Abstract
The overall efficiency of an extreme-scale supercomputer largely relies on the performance of its network interconnects. Several of the state of the art supercomputers use networks based on the increasingly popular Dragonfly topology. It is crucial to study the behavior and performance of different parallel applications running on Dragonfly networks in order to make optimal system configurations and design choices, such as job scheduling and routing strategies. However, in order to study these temporal network behavior, we would need a tool to analyze and correlate numerous sets of multivariate time-series data collected from the Dragonfly’s multi-level hierarchies. This paper presents such a tool–a visual analytics system–that uses the Dragonfly network to investigate the temporal behavior and optimize the communication performance of a supercomputer. We coupled interactive visualization with time-series analysis methods to help reveal hidden patterns in the network behavior with respect to different parallel applications and system configurations. Our system also provides multiple coordinated views for connecting behaviors observed at different levels of the network hierarchies, which effectively helps visual analysis tasks. We demonstrate the effectiveness of the system with a set of case studies. Our system and findings can not only help improve the communication performance of supercomputing applications, but also the network performance of next-generation supercomputers. Keywords: Supercomputing, Parallel communication network, Dragonfly networks, Time-series data, Performance analysis, Visual analytics