Brazilian Archives of Biology and Technology (Jan 2022)

A Critique Empirical Evaluation of Relevance Computation for Focused Web Crawlers

  • Joe Dhanith Pal Nesamony Rose Mary,
  • Surendiran Balasubramanian,
  • Raja Soosaimarian Peter Raj

DOI
https://doi.org/10.1590/1678-4324-2021210223
Journal volume & issue
Vol. 64

Abstract

Read online

Abstract Analogous to the spectacular growth of information-superhighway, The Internet, demands for coherent and economical crawling methods are translucent to shoot up. Consequently, many innovative techniques have been put forth for efficient crawling. Among them the significant one is focused crawlers. The focused crawlers are capable in searching web pages that are suitable for the topics defined in advance. Focused crawlers attract several search engines on the grounds of efficient filtering, reduced memory and time consumption. This paper furnishes a relevance computation based survey on web crawling. A bunch of fifty two focused crawlers from the existing literature survey is categorized to four different classes - classic focused crawler, semantic focused crawler, learning focused crawler and ontology learning focused crawler. The prerequisite and the mastery of each metric with respect to harvest rate, target recall, precision and F1-score are discussed. Future outlooks, shortcomings and strategies are also suggested.

Keywords