Journal of Applied Science and Engineering (Sep 2022)

Web Scraping Tool For Newspapers And Images Data Using Jsonify

  • Qingli Niu,
  • Irfan Ali Kandhro,
  • Anil Kumar,
  • Shahnawaz shah,
  • Muhammad Hasan,
  • Hifza Mehfooz Ahmed,
  • Fei Liang

DOI
https://doi.org/10.6180/jase.202304_26(4).0002
Journal volume & issue
Vol. 26, no. 4
pp. 465 – 474

Abstract

Read online

Web scraping is the process of extracting data from a website in an efficient and fast way. In such a scenario, python programming can offer useful set of methods that help web editors to improve the quality of the provided service. This scraper contains three steps 1) to understand the structure of web page, 2) design regular expression pattern and finally use that pattern to get certain data. In this paper, we also used Flask, Request, JSONify library to get the data, after processing, the data is transformed into the JSON form and ready for CSV with help of API. After generated all required regex patterns, the system uses these patterns as a set of rules, and with this, designed scraper tool works efficiently, and achieved outstanding results with help of support libraries to storing and extracting the news and web-based information. The proposed Web scraping tool eliminates the time and effort of manually collecting or copying data by automating the process. It is found that this designed scraper is easy and direct approach to extract the newspapers, websites, blogs, and images data.

Keywords