IEEE Access (Jan 2022)
Effective Hardware Accelerator for 2D DCT/IDCT Using Improved Loeffler Architecture
Abstract
This paper proposes an effective hardware accelerator for 2D $8\times 8$ discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) using an improved Loeffler architecture. The accelerator optimizes the data stream of the Loeffler 8-point 1D DCT/IDCT according to the characteristics of image and video processing. An 8-stage pipeline structure greatly improves the processing speed by reasonably dividing the number of clock cycles and simplifying the arithmetic operations in each cycle. The multiplication-free approximation of the DCT coefficients is implemented through adders and shifters, combined with both fixed-point and canonic signed digit (CSD) coding. In particular, the proposed fast parallel transposed matrix architecture achieves the function of row-column coefficient conversion with lower circuit complexity. The FPGA implementation of the proposed architecture uses a Virtex-7 XC7VX330T device, running at 288 MHz with a throughput of 558 M Pixel/sec, and a Full HD real-time frame rate of up to 269 fps. Only 33 cycles are required to complete the $8\times 8$ blocks of 2D DCT/IDCT, which can be used as a high-performance hardware accelerator for image and video compression encoding.
Keywords