Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY

Silvina Caino-Lores; Jesus Carretero; Bogdan Nicolae; Orcun Yildiz; Tom Peterka

doi:10.1109/ACCESS.2019.2949836

IEEE Access (Jan 2019)

Toward High-Performance Computing and Big Data Analytics Convergence: The Case of Spark-DIY

Silvina Caino-Lores,
Jesus Carretero,
Bogdan Nicolae,
Orcun Yildiz,
Tom Peterka

Affiliations

Silvina Caino-Lores: ORCiD; Department of Computer Science and Engineering, Computer Architecture and Technology Area (ARCOS), University Carlos III of Madrid, Leganés, Spain
Jesus Carretero: Department of Computer Science and Engineering, Computer Architecture and Technology Area (ARCOS), University Carlos III of Madrid, Leganés, Spain
Bogdan Nicolae: Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA
Orcun Yildiz: Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA
Tom Peterka: Mathematics and Computer Science Division, Argonne National Laboratory, Lemont, IL, USA

DOI: https://doi.org/10.1109/ACCESS.2019.2949836
Journal volume & issue: Vol. 7
pp. 156929 – 156955

Abstract

Read online

Convergence between high-performance computing (HPC) and big data analytics (BDA) is currently an established research area that has spawned new opportunities for unifying the platform layer and data abstractions in these ecosystems. This work presents an architectural model that enables the interoperability of established BDA and HPC execution models, reflecting the key design features that interest both the HPC and BDA communities, and including an abstract data collection and operational model that generates a unified interface for hybrid applications. This architecture can be implemented in different ways depending on the process- and data-centric platforms of choice and the mechanisms put in place to effectively meet the requirements of the architecture. The Spark-DIY platform is introduced in the paper as a prototype implementation of the architecture proposed. It preserves the interfaces and execution environment of the popular BDA platform Apache Spark, making it compatible with any Spark-based application and tool, while providing efficient communication and kernel execution via DIY, a powerful communication pattern library built on top of MPI. Later, Spark-DIY is analyzed in terms of performance by building a representative use case from the hydrogeology domain, EnKF-HGS. This application is a clear example of how current HPC simulations are evolving toward hybrid HPC-BDA applications, integrating HPC simulations within a BDA environment.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords