SoftwareX (Feb 2024)

DARS: Decentralized Article Retrieval System

  • Adrian Alexandrescu,
  • Cristian Nicolae Butincu

Journal volume & issue
Vol. 25
p. 101624

Abstract

Read online

DARS is a decentralized article retrieval system designed to bring the community together for parallel and distributed extraction of article content from the web. The system is comprised of three types of components: web crawlers to extract the links from website pages, web scrapers to extract article information from the web pages identified as articles, and a retrieval core to manage the extraction process. To attain decentralization, multiple such systems can be deployed in different locations. When a client queries one of the nodes, the returned information can be an aggregate of data from multiple nodes. The system is flexible and can be adapted to extract different types of information in a decentralized manner.

Keywords