IEEE Access (Jan 2024)

Sherlock in OSS: A Novel Approach of Content-Based Searching in Object Storage System

  • Jannatun Noor,
  • Md. Sadiqul Islam Sakif,
  • Joyanta Jyoti Mondal,
  • Mir Rownak Ali Uday,
  • Rizwanul Haque Ratul,
  • Sriram Chellappan,
  • A. B. M. Alim Al Islam

DOI
https://doi.org/10.1109/ACCESS.2024.3401074
Journal volume & issue
Vol. 12
pp. 69456 – 69474

Abstract

Read online

Cloud-based Object Storage Systems (OSS) are known for their scalability, durability, availability, and concurrency. However, there is a noticable vaccum in open-source OSS for a straightforward way for users and administrators to conduct data searches within object storage without fully utilizing the cloud infrastructure. In our research, we present Sherlock, a novel Content-Based Searching (CoBS) framework. Sherlock enhances search capabilities by using extra information from images and documents, incorporating this information into an Elasticsearch-powered database to enable content-driven searches. The framework operates through a two-stage process. First, it classifies the incoming data by type, directing images to an object detection model and processing documents for keyword extraction. Then, Elasticsearch catalogs the extracted data, facilitating searches based on content. The effectiveness of our searches is largely dependent on the precision of these models, which we improve by training them on large-scale datasets: the Microsoft COCO Dataset for multimedia content and the SemEval2017 Dataset for text documents. We further test our system’s performance by integrating it with the open-source OSS, OpenStack Swift, and conducting real-world experiments with image uploads to evaluate how our model performs within Swift’s object storage environments.

Keywords