IEEE Access (Jan 2024)
High-Throughput, Sorted QR Accelerator for Non-Linear Processing in Open-RAN Systems
Abstract
Open Radio Access Networks (Open-RANs) require cost- and energy-efficient solutions to facilitate their large-scale deployment. A significant concern in multiple-input multiple-output (MIMO) systems employing traditional linear processing is the substantial number of radio frequency (RF) chains at the base station (BS), which is required to ensure accurate decoding of the spatially multiplexed streams. Recently, however, practical non-linear approaches, which facilitate near-optimal parallelizable tree searches, have been successfully implemented on actual systems and have demonstrated the capability to considerably reduce the required RF chains without affecting user performance. Similar to QR decomposition (QRD), which is used to perform channel inversion in linear systems, these non-linear approaches employ a sorted QRD (SQRD) to curtail the search complexity. However, this can be a significant bottleneck for general software-based non-linear solutions, preventing them from fully exploiting their gains. To address the latency limitations of SQRD, this work presents a high-throughput hardware accelerator based on reformulating the underlying Modified Gram Schmidt process (MGS) to extract further parallelism than previous designs. Implementations of the proposed architecture demonstrate at least 2-fold improvements in the achievable throughput and processing latency over existing $4\times 4$ and $8\times 8$ field-programmable gate array (FPGA) implementations and can be scaled up to $16\times 16$ MIMO systems. Furthermore, the proposed accelerator was integrated with a software framework that can considerably offload the processing burden for a higher number of streams under strict latency conditions.
Keywords