Ovidius University Annals: Economic Sciences Series (Aug 2023)

Understanding Customers' Opinion using Web Scraping and Natural Language Processing

  • Alin-Gabriel Vaduva,
  • Simona-Vasilica Oprea,
  • Dragos-Catalin Barbu

Journal volume & issue
Vol. XXIII, no. 1
pp. 537 – 544

Abstract

Read online

The web offers large volumes of data that is unstructured and fails to be further processed if not extracted and organized into local variables or into databases. In this paper, we aim to extract data from the Internet using web scraping and analyse it with Natural Language Processing (NLP). Our purpose is to understand customers’ opinions by extracting reviews and investigating them in Python. The positive or negative insight of the reviews, along with the word cloud offer additional tools to understand the customers, predict their behaviour and underpin problems signalled in the reviews. TextBlob and BERTweet are applied to analyse the reviews. To enhance the comprehension of the outcomes, a comparison is drawn between the classifications generated by the BERTweet model and those provided by the TextBlob API, a widely used Python library for performing various NLP tasks. Furthermore, the reviews are pre-processed to clean them from line breaks, punctuation characters etc. and a n-grams analysis is performed to better understand the positive and negative reviews. The frequency of the reviews displays the concrete problems faced by customers visiting the hotel in various seasons. It helps decision makers to take measures and improve the quality of the hotel services.

Keywords