Adapting Hidden Naive Bayes for Text Classification

Shengfeng Gan; Shiqi Shao; Long Chen; Liangjun Yu; Liangxiao Jiang

doi:10.3390/math9192378

Mathematics (Sep 2021)

Adapting Hidden Naive Bayes for Text Classification

Shengfeng Gan,
Shiqi Shao,
Long Chen,
Liangjun Yu,
Liangxiao Jiang

Affiliations

Shengfeng Gan: College of Computer, Hubei University of Education, Wuhan 430205, China
Shiqi Shao: School of Computer Science, China University of Geosciences, Wuhan 430074, China
Long Chen: School of Computer Science, China University of Geosciences, Wuhan 430074, China
Liangjun Yu: College of Computer, Hubei University of Education, Wuhan 430205, China
Liangxiao Jiang: School of Computer Science, China University of Geosciences, Wuhan 430074, China

DOI: https://doi.org/10.3390/math9192378
Journal volume & issue: Vol. 9, no. 19
p. 2378

Abstract

Read online

Due to its simplicity, efficiency, and effectiveness, multinomial naive Bayes (MNB) has been widely used for text classification. As in naive Bayes (NB), its assumption of the conditional independence of features is often violated and, therefore, reduces its classification performance. Of the numerous approaches to alleviating its assumption of the conditional independence of features, structure extension has attracted less attention from researchers. To the best of our knowledge, only structure-extended MNB (SEMNB) has been proposed so far. SEMNB averages all weighted super-parent one-dependence multinomial estimators; therefore, it is an ensemble learning model. In this paper, we propose a single model called hidden MNB (HMNB) by adapting the well-known hidden NB (HNB). HMNB creates a hidden parent for each feature, which synthesizes all the other qualified features’ influences. For HMNB to learn, we propose a simple but effective learning algorithm without incurring a high-computational-complexity structure-learning process. Our improved idea can also be used to improve complement NB (CNB) and the one-versus-all-but-one model (OVA), and the resulting models are simply denoted as HCNB and HOVA, respectively. The extensive experiments on eleven benchmark text classification datasets validate the effectiveness of HMNB, HCNB, and HOVA.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords