Jisuanji kexue yu tansuo (Sep 2020)

Design and Optimization of Parallel LZMA for Many-Core Sunway Processor

  • LI Bingzheng, HUANG Gaoyang, XU Jinchen

DOI
https://doi.org/10.3778/j.issn.1673-9418.1909070
Journal volume & issue
Vol. 14, no. 9
pp. 1501 – 1509

Abstract

Read online

In recent years, the development of high-performance computing and scientific computing applications results in a huge explosion of data transmitted, stored, and processed by high-performance computing cluster systems. Under this circumstance, efficient compression of large-scale data is needed to improve the performance of high-performance computing cluster systems, which will reduce not only the space required for data storage, but also the communication bandwidth required for transmission. In lossless compression algorithms, LZMA (Lempel Ziv-Markov chain algorithm) has the high compression ratio, but the compression rate of LZMA algorithm in serial version is very slow. Lots of studies use parallel computing to promote the performance of lossless compression algorithms, taking advantage of multi-core architectures. This paper proposes a parallel design and optimization of LZMA based on the Sunway 26010 heterogeneous many-core processor. Combining with Sunway heterogeneous many-core processor’s features, several key factors affecting the performance of LZMA are analyzed, such as space requirements, memory access features, hotspot functions, etc. Based on the Athread interface, the sliding window algorithm of LZMA is reconstructed for the multi-thread parallel. LDM address space is fine-grained and optimized to achieve a better cache performance. Cyclic sliding window algorithm is also achieved using DMA double buffer. The test results show that using the Silesia Corpus benchmark, the final optimized LZMA algorithm achieves a maximum speedup of 4.1 times over the serial baseline implementation of the controller core, while on the big data benchmark speedup is 5.3 times.

Keywords