Journal of Telecommunications and Information Technology (Jun 2014)

Cassiopeia – Towards a Distributed and Composable Crawling Platform

  • Leszek Siwik,
  • Robert Marcjan,
  • Kamil Włodarczyk

DOI
https://doi.org/10.26636/jtit.2014.2.1026
Journal volume & issue
no. 2

Abstract

Read online

When it comes to designing and implementing crawling systems or Internet robots, it is of the utmost importance to first address efficiency and scalability issues (from a technical and architectural point of view), due to the enormous size and unimaginable structural complexity of the World Wide Web. There are, however, a significant number of users for whom flexibility and ease of execution are as important as efficiency. Running, defining, and composing Internet robots and crawlers according to dynamically-changing requirements and use-cases in the easiest possible way (e.g. in a graphical, drag & drop manner) is necessary especially for criminal analysts. The goal of this paper is to present the idea, design, crucial architectural elements, Proof-of-Concept (PoC) implementation, and preliminary experimental assessment of Cassiopeia framework, i.e. an all-in-one studio addressing both of the above-mentioned aspects.

Keywords