Yuanzineng kexue jishu (Feb 2022)

Study on Load Balanced Particle Weight Adjustment Algorithm for Particle Parallel Monte Carlo Criticality Calculation

  • FU Yuanguang;LIU Peng;LI Rui;WANG Xin;DENG Li

Journal volume & issue
Vol. 56, no. 2
pp. 316 – 325

Abstract

Read online

During the source iteration process of Monte Carlo criticality calculation with particle parallel, there exists stochastic fluctuate which may cause unbalanced allocation of number of fission banks. Without inducing bank adjustment algorithm, a fewer particle number per cycle with more cycles would lead to a poorer load balance. The MasterSlave algorithm keeps one process to gather, reallocate and broadcast all fission banks from and to other running process at the end of each iteration cycle, which would guarantee an equal load allocation. However, large amount of data transmission and frequent operations of gather and broadcast would cause a low parallel efficiency. A poor acceleration ratio is found especially in the case with fewer particle numbers per cycle and more cycles for MasterSlave. Instead, for the Nearest Neighbor algorithm, one process keeps a great proportion of particles locally, and only sends a small proportion to its nearest adjacent processes, which greatly reduce the amount of data to transfer. A better acceleration ratio was found compared with Master-Slave, which is not sensitive to different settings of particle and cycle number. In this work, a new load balanced algorithm was proposed. Instead of transmitting data among processes, this algorithm uses weight adjustment scheme. In this algorithm, total particle weight is split equally to each process at the beginning of simulation. The local total weight always keeps being constant as cycles proceed, but the local single particle born weight is adjusted based on the number of local fission banks in each process. As each process deals with equal particle weight, a good load balance can be achieved. As each process does operations separately, no fission bank data need to be transmitted, which leads to a good parallel efficiency. The defect is that it cannot keep consistency between serial and parallel results, which is not a severe case to stochastic simulation. A simple PWR pincell problem and a 2×2 PWR assembly problem were used to test acceleration ability of different algorithm, with single process to 120 processes was used. It is found that new algorithm achieves a higher acceleration ratio compared to MasterSlave and Nearest Neighbor in different settings of particle and cycle number. Further, BEAVRS whole core problem was used on TianheⅡ supercomputer to test the weak and strong scaling parallel efficiency of the new algorithm, with 9254% and 8147% respectively of 4 800 processes relative to 128 processes.

Keywords