IEEE Access (Jan 2019)

MT-DMA: A DMA Controller Supporting Efficient Matrix Transposition for Digital Signal Processing

  • Sheng Ma,
  • Yuanwu Lei,
  • Libo Huang,
  • Zhiying Wang

DOI
https://doi.org/10.1109/ACCESS.2018.2889558
Journal volume & issue
Vol. 7
pp. 5808 – 5818

Abstract

Read online

Matrix transposition plays a critical role in digital signal processing. However, the existing matrix transposition implementations have significant limitations. A traditional design uses load and store instructions to accomplish matrix transposition. Depending on the amount of load/store units, this design typically transposes up to one matrix element per clock cycle. More seriously, this design cannot perform matrix transposition and data calculations in parallel. Modern digital signal processors integrate the support for matrix transposition into the direct memory access (DMA) controller; the matrix can be transposed during data movements. It allows the parallel execution of matrix transposition and data calculations. Yet, its bandwidth utilization is limited; it can only transfer one matrix element per clock cycle. To address the limitations of the existing designs, we propose matrix transposition DMA (MT-DMA), to support efficient matrix transposition in DMA controllers. It can transpose multiple matrix elements per clock cycle to improve the bandwidth utilization. Compared with the existing designs, MT-DMA achieves a maximum 23.9 times performance improvement for micro-benchmarks. It is also more energy efficient. Since MT-DMA effectively hides the latency of matrix transposition behind data calculations, it performs very closely to an ideal design for real applications.

Keywords