IEEE Access (Jan 2024)

A Hybrid Frequency Based, Syntax, and Conditional Random Field Method for Implicit and Explicit Aspect Extraction

  • Mohammad Mashrekul Kabir,
  • Zulaiha Ali Othman,
  • Mohd Ridzwan Yaakub

DOI
https://doi.org/10.1109/ACCESS.2024.3403479
Journal volume & issue
Vol. 12
pp. 72361 – 72373

Abstract

Read online

Aspect extraction is the most important factor influencing the quality of Aspect-Based Sentiment Analysis (ABSA). Aspect extractions are divided into three approaches: supervised, unsupervised, and hybrid methods. Most previous aspect extraction algorithms focus on the explicit aspect, which lacks meaningful ABSA. Thus, considering the implicit aspect has become significant as most customer reviews consist of these by about 30%. Hybrid approaches have attempted to solve both aspects by integrating the frequency-based approaches (FB) with syntax, hybrid Conditional Random Field (CRF) with syntax, and hybrid FB with CRF, but the performance is still low. Extracting the implicit and compound noun aspects is challenging due to their hidden nature and special syntax needs. The implicit aspect needs FB, syntax dependency, and specialized tagging with feature embedding using natural language processing (NLP) solution, while the compound noun aspect needs FB, syntax dependency-based NLP solving. Therefore, this paper proposes a noble hybrid method combining frequency-based, syntax-based dependency, and CRF algorithms using tagging and labeling to extract both implicit and explicit aspects. It also provides a mechanism for extracting and calculating implicit and explicit aspects and a method for accurately identifying compound noun aspects. Experiments with the benchmark SemEval and Amazon review dataset validate the suggested hybrid model’s effectiveness. The proposed method produced remarkable improvement in extracting explicit and implicit aspects and improved overall results with precision-recall and accuracy between 5-15 percent. Moreover, this study has shown the number and the list of explicit and implicit aspects. The proposed method has obtained (2269, 368), (523, 60), (322, 55) number of explicit versus implicit aspects for SemEval 16, Amazon (Canon), and Amazon (Nokia) datasets, respectively. Hybrid with frequency-based, CRF’s superiority tagging and syntax help solve implicit and explicit aspects, including compound nouns.

Keywords