A Hybrid Frequency Based, Syntax, and Conditional Random Field Method for Implicit and Explicit Aspect Extraction

Mohammad Mashrekul Kabir; Zulaiha Ali Othman; Mohd Ridzwan Yaakub

doi:10.1109/access.2024.3403479

IEEE Access (Jan 2024)

A Hybrid Frequency Based, Syntax, and Conditional Random Field Method for Implicit and Explicit Aspect Extraction

Mohammad Mashrekul Kabir,
Zulaiha Ali Othman,
Mohd Ridzwan Yaakub

Affiliations

Mohammad Mashrekul Kabir: ORCiD; Centre for Artificial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia, Bangi, Malaysia
Zulaiha Ali Othman: ORCiD; Centre for Artificial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia, Bangi, Malaysia
Mohd Ridzwan Yaakub: ORCiD; Centre for Artificial Intelligence Technology (CAIT), Universiti Kebangsaan Malaysia, Bangi, Malaysia

DOI: https://doi.org/10.1109/access.2024.3403479
Journal volume & issue: Vol. 12
pp. 72361 – 72373

Abstract

Read online

Aspect extraction is the most important factor influencing the quality of Aspect-Based Sentiment Analysis (ABSA). Aspect extractions are divided into three approaches: supervised, unsupervised, and hybrid methods. Most previous aspect extraction algorithms focus on the explicit aspect, which lacks meaningful ABSA. Thus, considering the implicit aspect has become significant as most customer reviews consist of these by about 30%. Hybrid approaches have attempted to solve both aspects by integrating the frequency-based approaches (FB) with syntax, hybrid Conditional Random Field (CRF) with syntax, and hybrid FB with CRF, but the performance is still low. Extracting the implicit and compound noun aspects is challenging due to their hidden nature and special syntax needs. The implicit aspect needs FB, syntax dependency, and specialized tagging with feature embedding using natural language processing (NLP) solution, while the compound noun aspect needs FB, syntax dependency-based NLP solving. Therefore, this paper proposes a noble hybrid method combining frequency-based, syntax-based dependency, and CRF algorithms using tagging and labeling to extract both implicit and explicit aspects. It also provides a mechanism for extracting and calculating implicit and explicit aspects and a method for accurately identifying compound noun aspects. Experiments with the benchmark SemEval and Amazon review dataset validate the suggested hybrid model’s effectiveness. The proposed method produced remarkable improvement in extracting explicit and implicit aspects and improved overall results with precision-recall and accuracy between 5-15 percent. Moreover, this study has shown the number and the list of explicit and implicit aspects. The proposed method has obtained (2269, 368), (523, 60), (322, 55) number of explicit versus implicit aspects for SemEval 16, Amazon (Canon), and Amazon (Nokia) datasets, respectively. Hybrid with frequency-based, CRF’s superiority tagging and syntax help solve implicit and explicit aspects, including compound nouns.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords