Jisuanji kexue yu tansuo (Aug 2020)

Parallelization and Optimization of Application for Phonon BTE

  • WEN Minhua, LIU Yongzhi, BAO Hua, HU Yue, SHEN Yongxing, WEI Jianwen, LIN Xinhua

DOI
https://doi.org/10.3778/j.issn.1673-9418.1909072
Journal volume & issue
Vol. 14, no. 8
pp. 1288 – 1297

Abstract

Read online

Heat conduction, as occurring at submicron scale can be predicted effectively using the Boltzmann transport equation (BTE) for phonons. Compared with the stochastic methods, the deterministic method represented by the finite volume method for the phonon BTE is considered to be more promising to solve engineering practical problems. However, the finite volume method has the problems of large number of iteration steps and long iteration time. To this end, the parallel acceleration scheme on GPU for the iterative solution part of phonon BTE is proposed. And the appropriate thread allocation method and data storage format are designed. This paper also applies the loop unrolling and merging kernel functions to optimize the iteration process. In addition, the multi-GPU version of phonon BTE is implemented by using the direction-based parallel strategy with the MPI+CUDA, CUDA-Aware MPI and NCCL (NVIDIA collective communications library). Experimental results show that the performance of the single GPU version on a V100 is up to 31.5X faster than the serial implementation of Intel Xeon Gold 6248. And the multi-GPU version with NCCL yields 83% parallel efficiency on 8 DGX-2 nodes with a total of 128 V100 GPUs, which is 57% higher than the parallel method using MPI+CUDA.

Keywords