Applied Sciences (May 2022)

A Survey on Malleability Solutions for High-Performance Distributed Computing

  • Jose I. Aliaga,
  • Maribel Castillo,
  • Sergio Iserte,
  • Iker Martín-Álvarez,
  • Rafael Mayo

DOI
https://doi.org/10.3390/app12105231
Journal volume & issue
Vol. 12, no. 10
p. 5231

Abstract

Read online

Maintaining a high rate of productivity, in terms of completed jobs per unit of time, in High-Performance Computing (HPC) facilities is a cornerstone in the next generation of exascale supercomputers. Process malleability is presented as a straightforward mechanism to address that issue. Nowadays, the vast majority of HPC facilities are intended for distributed-memory applications based on the Message Passing (MP) paradigm. For this reason, many efforts are based on the Message Passing Interface (MPI), the de facto standard programming model. Malleability aims to rescale executions on-the-fly, in other words, reconfigure the number and layout of processes in running applications. Process malleability involves resources reallocation within the HPC system, handling processes of the application, and redistributing data among those processes to resume the execution. This manuscript compiles how different frameworks address process malleability, their main features, their integration in resource management systems, and how they may be used in user codes. This paper is a detailed state-of-the-art devised as an entry point for researchers who are interested in process malleability.

Keywords