Современные информационные технологии и IT-образование (Mar 2022)
Horizontably Scalable LSM-tree Based Index for Full-Text Search Boolean Queries
Abstract
Let there be a set of objects, where each object is characterized by Boolean features. Let logical query or Boolean query denote a query to find objects characterized by a Boolean function by features, for example, “documents with all words ...and any words ...and without words ...”. The terms “object” and “document” are used interchangeably hereinafter. The features may have different semantics, i. e. some features may correspond to the words of the document, some to labels or categories, and some to per-bit time quantum of a document’s date. Although indices executing Boolean queries are well researched in the literature, the common technique of maintaining posting lists is not always acceptable. If the data volume can reach to the order of petabytes, a compact index structure becomes vital. The aim of the research is to suggest a method to build an efficient bitmap index in secondary memory that allows us to update or append data being indexed with high write throughput. We propose an efficient LSM-based index for bitmaps and feature design for practical applications. We also discuss aspects of building of joined indices in order to achieve good scalability. The paper describes the architecture and algorithms of a suggested index and includes results of our experiments that show sustainable performance of our solution.
Keywords