Journal of Big Data (Apr 2021)

PCJ Java library as a solution to integrate HPC, Big Data and Artificial Intelligence workloads

  • Marek Nowicki,
  • Łukasz Górski,
  • Piotr Bała

DOI
https://doi.org/10.1186/s40537-021-00454-6
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 21

Abstract

Read online

Abstract With the development of peta- and exascale size computational systems there is growing interest in running Big Data and Artificial Intelligence (AI) applications on them. Big Data and AI applications are implemented in Java, Scala, Python and other languages that are not widely used in High-Performance Computing (HPC) which is still dominated by C and Fortran. Moreover, they are based on dedicated environments such as Hadoop or Spark which are difficult to integrate with the traditional HPC management systems. We have developed the Parallel Computing in Java (PCJ) library, a tool for scalable high-performance computing and Big Data processing in Java. In this paper, we present the basic functionality of the PCJ library with examples of highly scalable applications running on the large resources. The performance results are presented for different classes of applications including traditional computational intensive (HPC) workloads (e.g. stencil), as well as communication-intensive algorithms such as Fast Fourier Transform (FFT). We present implementation details and performance results for Big Data type processing running on petascale size systems. The examples of large scale AI workloads parallelized using PCJ are presented.

Keywords