IEEE Access (Jan 2016)
1.5 Gbit/s FPGA Implementation of a Fully-Parallel Turbo Decoder Designed for Mission-Critical Machine-Type Communication Applications
Abstract
In wireless communication schemes, turbo codes facilitate near-capacity transmission throughputs by achieving reliable forward error correction. However, owing to the serial data dependencies imposed by the underlying logarithmic Bahl-Cocke-Jelinek-Raviv (Log-BCJR) algorithm, the limited processing throughputs of conventional turbo decoder implementations impose a severe bottleneck upon the overall throughputs of real-time wireless communication schemes. Motivated by this, we recently proposed a fully parallel turbo decoder (FPTD) algorithm, which eliminates these serial data dependencies, allowing parallel processing and hence offering a significantly higher processing throughput. In this paper, we propose a novel resource-efficient version of the FPTD algorithm, which reduces its computational resource requirement by 50%, which enhancing its suitability for field-programmable gate array (FPGA) implementations. We propose a model FPGA implementation. When using a Stratix IV FPGA, the proposed FPTD FPGA implementation achieves an average throughput of 1.53 Gb/s and an average latency of 0.56 μs, when decoding frames comprising N = 720 b. These are, respectively, 13.2 times and 11.1 times superior to those of the state-of-the-art FPGA implementation of the Log-BCJR long-term evolution (LTE) turbo decoder, when decoding frames of the same frame length at the same error correction capability. Furthermore, our proposed FPTD FPGA implementation achieves a normalized resource usage of 0.42 (kALUTs/Mb/s), which is 5.2 times superior to that of the benchmarker decoder. Furthermore, when decoding the shortest N = 40-b LTE frames, the proposed FPTD FPGA implementation achieves an average throughput of 442 Mb/s and an average latency of 0.18 μs, which are, respectively, 21.1 times and 10.6 times superior to those of the benchmarker decoder. In this case, the normalized resource usage of 0.08 (kALUTs/Mb/s) is 146.4 times superior to that of the benchmarker decoder.
Keywords