IEEE Access (Jan 2020)
Runtime Adaptive Matrix Multiplication for the SW26010 Many-Core Processor
Abstract
The study of matrix multiplication on the emerging SW26010 processor is highly significant for many scientific and engineering applications. The state-of-the-art work from the swBLAS library, called SWMM, focuses mainly on the infrequent case involving special matrix dimensions and determines the execution action of matrix multiplication by one specified algorithm. To further adapt to various matrix shapes, in this article, we present a runtime adaptive matrix multiplication methodology, called RTAMM, which targets the features of the SW26010 architecture. The execution action of RTAMM is determined dynamically at runtime via several fundamental cost formulas and multiple sets of blocking factors, rather than determining the action at library generation time. With comprehensive trade-offs between the computation and data access, overall architecture-oriented optimization methods are introduced at three levels (macro, assistant, and micro) to fully exploit the computing capability of SW26010. The experiments show that RTAMM can achieve competitive peak performance compared with SWMM. Moreover, in tests on 6000 different matrix multiplication cases, RTAMM outperforms SWMM in 85.55% of the cases, and the improvements range from 5% to 308%, whereas RTAMM is slightly inferior to SWMM in only 1.28% of the cases. These results demonstrate that RTAMM has both great adaptability and considerable performance improvement.
Keywords