TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU

WANG Yuhua, ZHANG Yuqi, HE Junfei, XU Yuezhu, CUI Huanyu

doi:10.3778/j.issn.1673-9418.2304039

Jisuanji kexue yu tansuo (Apr 2024)

TEB: Efficient SpMV Storage Format for Matrix Decomposition and Reconstruction on GPU

WANG Yuhua, ZHANG Yuqi, HE Junfei, XU Yuezhu, CUI Huanyu

Affiliations

WANG Yuhua, ZHANG Yuqi, HE Junfei, XU Yuezhu, CUI Huanyu: School of Computer Science and Technology, Harbin Engineering University, Harbin 150000, China

DOI: https://doi.org/10.3778/j.issn.1673-9418.2304039
Journal volume & issue: Vol. 18, no. 4
pp. 1094 – 1108

Abstract

Read online

Sparse matrix-vector multiplication (SpMV) is a crucial computing process in the field of science and engineering. CSR (compressed sparse row) format is one of the most commonly used storage formats for sparse matrix. In the process of implementing parallel SpMV on the graphics processing unit (GPU), it only stores non-zero elements of sparse matrix, avoiding computational redundancy caused by zero element filling, and saving storage space. But there is a problem of load imbalance, which wastes computing resources. To address the aforementioned issues, storage formats with good performance in recent years have been studied, and a row by row decomposition and reorganization storage format—TEB (threshold-exchangeorder block) format has been proposed. The format first uses a heuristic threshold selection algorithm to determine the appropriate segmentation threshold, and combines the row merging algorithm based on reordering to reconstruct and decompose the sparse matrix, so that the number of non-zero elements between blocks is as close as possible. Furthermore, combined with CUDA (computer unified device architecture) thread technology, a parallel SpMV algorithm between sub blocks based on TEB storage format is proposed, which can reasonably allocate computing resources and solve the problem of load imbalance, thus improving the parallel computing efficiency of SpMV. In order to verify the effectiveness of the TEB storage format, experiments are conducted on the NVIDIA Tesla V100 platform. The results show that compared to PBC (partition-block-CSR), AMF-CSR (adaptive multi-row folding of CSR), CSR-Scalar (compressed sparse row-scalar), and CSR5 (compressed sparse row 5) storage formats, TEB can improve SpMV time performance by an average of 3.23×, 5.83×, 2.33×, and 2.21×. In terms of floating-point computing performance, the average improvement can be 3.36×, 5.95×, 2.29×, and 2.13×

sparse matrix-vector multiplication (spmv); reorder; compressed sparse row (csr) format; load balancing; storage format; graphics processing unit (gpu)

Published in Jisuanji kexue yu tansuo

ISSN: 1673-9418 (Print)
Publisher: Journal of Computer Engineering and Applications Beijing Co., Ltd., Science Press
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://fcst.ceaj.org

About the journal

Abstract

Keywords