IEEE Access (Jan 2024)
Gem5-AVX: Extension of the Gem5 Simulator to Support AVX Instruction Sets
Abstract
Recent commodity x86 CPUs still dominate the majority of supercomputers and most of them implement vector architectures to support single instruction multiple data (SIMD). Although research on architectural exploration requires computer architecture simulators and a number of simulators have been developed, only a few tools support recent x86 SIMD instructions. This paper describes gem5-AVX, an extended version of the gem5 simulator that enables simulating recent x86 SIMD extensions, especially targeted for high performance computing (HPC). The gem5-AVX comprises advanced vector extension (AVX), AVX2 and subsets of AVX-512, except for cache and memory management instructions. Moreover, it covers full set of streaming SIMD extensions (SSE) and subsequent extensions that are required to simulate HPC workloads. It can simulate the key features of the AVX, AVX2 and AVX-512 such as 256 and 512 bits wide registers, three and four operands syntax, fused multiply-add (FMA), vector gather-scatter using vector scale-index-base (VSIB), mask registers, embedded broadcasting, compressed displacement memory addressing mode. We evaluate the accuracy of gem5-AVX by comparing its results to those of real hardware and Intel’s software development emulator (SDE) running benchmark suites,i.e., high-performance linpack (HPL), high-performance conjugate gradient (HPCG) and NAS parallel benchmark (NPB) which are representative programs in the HPC field. The gem5 and gem5-AVX are compared with the speed-up of HPL benchmark according to configuration combinations. Gem5-AVX, with mean absolute percentage errors of 7.3–9.2% and 9.2–11.9%, is more accurate than gem5, which shows mean absolute percentage errors 17.9–21.5% and 19.7–29.7% for Haswell and Skylake processors, respectively.
Keywords