IEEE Access (Jan 2019)

iFLEX: A Fully Open-Source, High-Density Field-Programmable Gate Array (FPGA)-Based Hardware Co-Processor for Vector Similarity Searching

  • Ludovico Minati,
  • Vardan Movsisyan,
  • Matthew Mccormick,
  • Khachatur Gyozalyan,
  • Tigran Papazyan,
  • Hrach Makaryan,
  • Stefano Aldrigo,
  • Taron Harutyunyan,
  • Hayk Ghaltaghchyan,
  • Chris Mccormick,
  • Mick Fandrich

DOI
https://doi.org/10.1109/ACCESS.2019.2934715
Journal volume & issue
Vol. 7
pp. 112269 – 112283

Abstract

Read online

Vector similarity searching consists of comparing a query vector against a high volume of entries in a reference data set, according to a chosen similarity metric such as the L1-norm, L2-norm, or Hamming distance. Large-scale research and commercial applications of these computations are developing rapidly across artificial intelligence fields as diverse as semantic text querying, retrieval of multimedia materials, and prediction of the properties of pharmacological molecules and engineered materials. While vector similarity searching is, at present, predominantly implemented on standard central processing unit (CPU) hardware running optimized indexing algorithms, the interest in massively-parallel computing architectures is increasing. Accordingly, a range of systems based on graphics processing units (GPU) and field-programmable gate arrays (FPGA) have been proposed; however, the availability of the design materials for these systems remains largely confined to a small number of corporations and research institutions. Here, we introduce a fully open-source hardware accelerator for vector similarity searching, based on an array of 21 FPGAs densely intertwined with 42 GB of high-speed dynamic memory and installed on a custom-designed compute node board, which yields an aggregate bandwidth of 33.6 GB/s and can be seamlessly reconfigured to implement nearly arbitrary distance calculations. A novel logic and software architecture, based on a lane-wise organization of independent engines implementing distributed distance calculation and sorting, allows attaining noteworthy query latency and power consumption performance on both single- and multi-node system configurations. The entire circuit board hardware, FPGA logic, and host software design is herein presented and freely provided for unlimited use, supporting open innovation and research in this area.

Keywords