Sistemasi: Jurnal Sistem Informasi (May 2024)

A Web Scraper for Data Mining Purposes

  • Yasir Ali Mahmood,
  • Bassim Mahmood

DOI
https://doi.org/10.32520/stmsi.v13i3.4107
Journal volume & issue
Vol. 13, no. 3
pp. 1243 – 1252

Abstract

Read online

The current revolution in technology makes data a crucial part of real-life applications due to its importance in making decisions. In the era of big data and the massive expansion of data streams on Internet networks and platforms, the process of data collection, mining, and analysis has become a not easy matter. Therefore, the presence of auxiliary applications for data mining and gathering has become a necessary need. Usually, companies offer special APIs to collect data from particular destinations, which needs a high cost. Generally, there is a severe lack in the literature in providing approaches that offer flexible, low, or free of cost tools for web scraping. Hence, this article provides a free tool that can be used for data mining and data collection purposes from the web. Specifically, an efficient Google Scholar web scraper is introduced. The extracted data can be used for analysis purposes and making decisions about an issue of interest. The proposed scraper can also be modified for crawling web links and retrieving specific data from a particular website. It can also formalize the collected data as a ready dataset to be used in the analysis phase. The efficiency of the proposed scraper is tested in terms of the time consumption, accuracy, and quality of the data collected. The findings showed that the proposed approach is highly feasible for data collection and can be adopted by data analysts.