DeepCKID: A Multi-Head Attention-Based Deep Neural Network Model Leveraging Classwise Knowledge to Handle Imbalanced Textual Data

Amit Kumar Sah; Muhammad Abulaish

Machine Learning with Applications (Sep 2024)

DeepCKID: A Multi-Head Attention-Based Deep Neural Network Model Leveraging Classwise Knowledge to Handle Imbalanced Textual Data

Amit Kumar Sah,
Muhammad Abulaish

Affiliations

Amit Kumar Sah: Department of Computer Science, South Asian University, New Delhi, India
Muhammad Abulaish: Corresponding author.; Department of Computer Science, South Asian University, New Delhi, India

Journal volume & issue: Vol. 17
p. 100575

Abstract

Read online

This paper presents DeepCKID, a Multi-Head Attention (MHA)-based deep learning model that exploits statistical and semantic knowledge corresponding to documents across different classes in the datasets to improve the model’s ability to detect minority class instances in imbalanced text classification. In this process, corresponding to each document, DeepCKID extracts — (i) word-level statistical and semantic knowledge, namely, class correlation and class similarity corresponding to each word, based on its association with different classes in the dataset, and (ii) class-level knowledge from the document using n-grams and relation triplets corresponding to classwise keywords present, identified using cosine similarity utilizing Transformers-based Pre-trained Language Models (PLMs). DeepCKID encodes the word-level and class-level features using deep convolutional networks, which can learn meaningful patterns from them. At first, DeepCKID combines the semantically meaningful Sentence-BERT document embeddings and word-level feature matrix to give the final document representation, which it further fuses to the different classwise encoded representations to strengthen feature propagation. DeepCKID then passes the encoded document representation and its different classwise representations through an MHA layer to identify the important features at different positions of the feature subspaces, resulting in a latent dense vector accentuating its association with a particular class. Finally, DeepCKID passes the latent vector to the softmax layer to learn the corresponding class label. We evaluate DeepCKID over six publicly available Amazon reviews datasets using four Transformers-based PLMs. We compare DeepCKID with three approaches and four ablation-like baselines. Our study suggests that in most cases, DeepCKID outperforms all the comparison approaches, including baselines.

Published in Machine Learning with Applications

ISSN: 2666-8270 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Science: Science (General): Cybernetics; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.journals.elsevier.com/machine-learning-with-applications

About the journal

Abstract

Keywords