Heliyon (May 2024)

Sentiment analysis of Arabic social media texts: A machine learning approach to deciphering customer perceptions

  • Ohud Alsemaree,
  • Atm S. Alam,
  • Sukhpal Singh Gill,
  • Steve Uhlig

Journal volume & issue
Vol. 10, no. 9
p. e27863

Abstract

Read online

Sentiment analysis (SA) is a subfield of artificial intelligence that entails natural language processing. This has become increasingly significant because it discerns the emotional tone of reviews, categorising them as positive, neutral, or negative. In the highly competitive coffee industry, understanding customer sentiment and perception is paramount for businesses seeking to optimise their product offerings. Traditional methods of market analysis often fall short of capturing the nuanced views of consumers, necessitating a more sophisticated approach to sentiment analysis. This research is motivated by the need for a nuanced understanding of customer sentiments across various coffee products, enabling companies to make informed decisions regarding product promotion, improvement, and discontinuation. However, sentiment analysis faces a challenge when it comes to analysing Arabic text due to the language's extraordinarily complex inflectional and derivational morphology. Consequently, to address this challenge, we have developed a new method designed to improve the precision and effectiveness of Arabic sentiment analysis, specifically focusing on understanding customer opinions about various coffee products on social media platforms like Twitter. We gathered 10,646 various coffee products' Twitter reviews and applied feature extraction techniques using the term frequency-inverse document frequency (TF-IDF) and minimum redundancy maximum relevance (MRMR). Subsequently, we performed sentiment analysis using four supervised learning algorithms: k-nearest neighbor, support vector machine, decision tree, and random forest. All the classification statements derived in the analysis were aggregated via ensemble learning to convey the final results. Our results demonstrated an increase in prediction accuracy, with our method achieving over 95.95% accuracy in the Hard voting and soft voting at 94.51 %.

Keywords