IEEE Access (Jan 2020)

Performance Comparison of GPU-Based Jacobi Solvers Using CUDA Provided Synchronization Methods

  • Maria Aslam,
  • Omer Riaz,
  • Shahzad Mumtaz,
  • Ali Daniyal Asif

DOI
https://doi.org/10.1109/ACCESS.2020.2973669
Journal volume & issue
Vol. 8
pp. 31792 – 31812

Abstract

Read online

In this manuscript, variants of Jacobi solver implementation on general purpose graphical processing units (GPGPU) have been purposed and compared. During this work, parallel implementation of finite element method (FEM) using Poisson's equation on shared memory architecture as well as on GPGPUs has been observed to identify computationally most expensive part of FEM software, which is linear algebra Jacobi solver. Sparse matrices were used for system of linear equations. Nine implementations of Jacobi solver have been developed and compared using various synchronization and computation methods like atomicAdd, atomicAdd_block, butterfly communication, grid synchronization, hybrid and whole GPU based computation methods, respectively. Experiments have showed that Jacobi implementations based on our implemented Butterfly communication method have outperformed CUDA 10.0 provided critical execution methods like atomicAdd, atomicAdd_block and grid methods. The GPU has achieved a max speedup of 46 times using GTX 1060 and 60 times using Quadro P4000 with double precision computations when compared with sequential implementation on Core-i7 8750H. All the developments were performed using C/C++ GNU compiler 7.3.0 on Ubuntu 18.04 and CUDA 10.0.

Keywords