IEEE Access (Jan 2025)

Optimizing Matrix-Vector Operations With CGLA for High-Performance Approximate k-NN Search

  • Dohyun Kim,
  • Yasuhiko Nakashima

DOI
https://doi.org/10.1109/access.2025.3582825
Journal volume & issue
Vol. 13
pp. 111087 – 111097

Abstract

Read online

In this paper, we propose a novel approach to accelerating approximate k-nearest neighbor (k-NN) search-a crucial operation in modern vector databases such as OpenSearch, ElasticSearch, Pinecone, and Milvus. Our method specifically targets the post-clustering phase in the inverted file (IVF) technique, where the bulk of computational effort is devoted to matrix-vector (MV) multiplication. The primary innovation lies in the optimization technique known as IMAX3–a Coarse-Grained Linear Array (CGLA)–which effectively leverages local memory to retain vectors between queries. This approach eliminates the need to reload vectors for subsequent queries, thereby minimizing memory movement and significantly reducing latency. Experimental evaluations demonstrate that, when excluding overhead, IMAX3 achieves up to a $54.9\times $ speedup for repeat queries compared to the initial execution-substantially outperforming conventional methods and making our solution highly suitable for real-time vector search applications. Moreover, relative to the RTX 4090, we achieve an energy-delay product (EDP) reduction of up to $498.41\times $ and on average $185.23\times $ .

Keywords