MuTCELM: An optimal multi-TextCNN-based ensemble learning for text classification

Victor Kwaku Agbesi; Wenyu Chen; Sophyani Banaamwini Yussif; Chiagoziem C. Ukwuoma; Yeong Hyeon Gu; Mugahed A. Al-antari

Heliyon (Oct 2024)

MuTCELM: An optimal multi-TextCNN-based ensemble learning for text classification

Victor Kwaku Agbesi,
Wenyu Chen,
Sophyani Banaamwini Yussif,
Chiagoziem C. Ukwuoma,
Yeong Hyeon Gu,
Mugahed A. Al-antari

Affiliations

Victor Kwaku Agbesi: School of Computer Science and Engineering, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, Sichuan, China; Corresponding authors.
Wenyu Chen: School of Computer Science and Engineering, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, Sichuan, China; Corresponding authors.
Sophyani Banaamwini Yussif: School of Computer Science and Engineering, University of Electronic Science and Technology of China, No. 2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, Sichuan, China
Chiagoziem C. Ukwuoma: Sichuan Engineering Technology Research Center for Industrial Internet Intelligent Monitoring and Application, Chengdu University of Technology, No. 610059, Chengdu, Sichuan, China
Yeong Hyeon Gu: Department of Artificial Intelligence and Data Science, College of AI Convergence, Daeyang AI Center, Sejong University, Seoul 05006, Republic of Korea; Corresponding authors.
Mugahed A. Al-antari: Department of Artificial Intelligence and Data Science, College of AI Convergence, Daeyang AI Center, Sejong University, Seoul 05006, Republic of Korea; Corresponding authors.

Journal volume & issue: Vol. 10, no. 19
p. e38515

Abstract

Read online

Feature extraction plays a critical role in text classification, as it converts textual data into numerical representations suitable for machine learning models. A key challenge lies in effectively capturing both semantic and contextual information from text at various levels of granularity while avoiding overfitting. Prior methods have often demonstrated suboptimal performance, largely due to the limitations of the feature extraction techniques employed. To address these challenges, this study introduces Multi-TextCNN, an advanced feature extractor designed to capture essential textual information across multiple levels of granularity. Multi-TextCNN is integrated into a proposed classification model named MuTCELM, which aims to enhance text classification performance. The proposed MuTCELM leverages five distinct sub-classifiers, each designed to capture different linguistic features from the text data. These sub-classifiers are integrated into an ensemble framework, enhancing the overall model performance by combining their complementary strengths. Empirical results indicate that MuTCELM achieves average improvements across all datasets in accuracy, precision, recall, and F1-macro scores by 0.2584, 0.2546, 0.2668, and 0.2612, respectively, demonstrating significant performance gains over baseline models. These findings underscore the effectiveness of Multi-TextCNN in improving model performance relative to other ensemble methods. Further analysis reveals that the non-overlapping confidence intervals between MuTCELM and baseline models indicate statistically significant differences, suggesting that the observed performance improvements of MuTCELM are not attributable to random chance but are indeed statistically meaningful. This evidence indicates the robustness and superiority of MuTCELM across various languages and text classification tasks.

Published in Heliyon

ISSN: 2405-8440 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General); Social Sciences: Social sciences (General)
Website: https://www.cell.com/heliyon/home

About the journal

Abstract

Keywords