IEEE Access (Jan 2024)
Trinity: In-Database Near-Data Machine Learning Acceleration Platform for Advanced Data Analytics
Abstract
The ability to perform machine learning (ML) tasks in a database management system (DBMS) is a new paradigm for conventional database systems as it enables advanced data analytics on top of well-established capabilities of DBMSs. However, the integration of ML in DBMSs introduces new challenges in traditional CPU-based systems because of its higher computational demands and bigger data bandwidth requirements. To address this, hardware acceleration has become even more important in database systems, and the computational storage device (CSD) placing an accelerator near storage is considered as an effective solution due to its high processing power with no extra data movement cost. In this paper, we propose Trinity, an end-to-end database system that enables in-database, in-storage platform that accelerates advanced analytics queries invoking trained ML models along with complex data operations. By designing a full stack from DBMS’s internal software components to hardware accelerator, Trinity enables in-database ML pipelines on the CSD. On the software side, we extend the internals of conventional DBMSs to utilize the accelerator in the SmartSSD. Our extended analyzer evaluates the compatibility of the current query with our hardware accelerator and compresses compatible queries into a 24-byte numeric format for efficient hardware processing. Furthermore, the predictor is extended to integrate our performance cost models to always offload queries into the optimal hardware backend. The proposed SmartSSD cost model mathematically models our hardware, including host operations, data transfers, FPGA kernel execution time, and the CPU cost model uses polynomial regression ML models to predict complex CPU latency. On the hardware side, we introduce the in-database processing accelerator (i-DPA), a custom FPGA-based accelerator. i-DPA includes database page decoder to fully exploit the bandwidth benefit of near-storage processing. It also employs dynamic tuple binding to enhance the overall parallelism and hardware utilization. i-DPA;s architecture having heterogeneous computing units with a reconfigurable on-chip interconnect also allows seamless data streaming, enabling task-level pipeline across different computing units. Finally, our evaluation shows that Trinity improves the end-to-end performance of analytics queries by $15.21\times $ on average and up to $57.18\times $ compared to the conventional CPU-based DBMS platform. We also show that the Trinity’s performance can linearly scale up with multiple SmartSSDs, achieving nearly up to $200\times $ speedup over the baseline with four SmartSSDs.
Keywords