EPJ Web of Conferences (Jan 2019)
Managing a heterogeneous scientific computing cluster with cloud-like tools: ideas and experience
Abstract
Obtaining CPU cycles on an HPC cluster is nowadays relatively simple and sometimes even cheap for academic institutions. However, in most of the cases providers of HPC services would not allow changes on the configuration, implementation of special features or a lower-level control on the computing infrastructure, for example for testing experimental configurations. The variety of use cases proposed by several departments of the University of Torino, including ones from solid-state chemistry, computational biology, genomics and many others, called for different and sometimes conflicting configurations; furthermore, several R&D activities in the field of scientific computing, with topics ranging from GPU acceleration to Cloud Computing technologies, needed a platform to be carried out on. The Open Computing Cluster for Advanced data Manipulation (OCCAM) is a multi-purpose flexible HPC cluster designed and operated by a collaboration between the University of Torino and the Torino branch of the Istituto Nazionale di Fisica Nucleare. It is aimed at providing a flexible and reconfigurable infrastructure to cater to a wide range of different scientific computing needs, as well as a platform for R&D activities on computational technologies themselves. We describe some of the use cases that prompted the design and construction of the system, its architecture and a first characterisation of its performance by some synthetic benchmark tools and a few realistic use-case tests.