Language Semantics Interpretation with an Interaction-Based Recurrent Neural Network

Shaw-Hwa Lo; Yiqiao Yin

doi:10.3390/make3040046

Machine Learning and Knowledge Extraction (Nov 2021)

Language Semantics Interpretation with an Interaction-Based Recurrent Neural Network

Shaw-Hwa Lo,
Yiqiao Yin

Affiliations

Shaw-Hwa Lo: Department of Statistics, Columbia University, New York, NY 10027, USA
Yiqiao Yin: Department of Statistics, Columbia University, New York, NY 10027, USA

DOI: https://doi.org/10.3390/make3040046
Journal volume & issue: Vol. 3, no. 4
pp. 922 – 945

Abstract

Read online

Text classification is a fundamental language task in Natural Language Processing. A variety of sequential models are capable of making good predictions, yet there is a lack of connection between language semantics and prediction results. This paper proposes a novel influence score (I-score), a greedy search algorithm, called Backward Dropping Algorithm (BDA), and a novel feature engineering technique called the “dagger technique”. First, the paper proposes to use the novel influence score (I-score) to detect and search for the important language semantics in text documents that are useful for making good predictions in text classification tasks. Next, a greedy search algorithm, called the Backward Dropping Algorithm, is proposed to handle long-term dependencies in the dataset. Moreover, the paper proposes a novel engineering technique called the “dagger technique” that fully preserves the relationship between the explanatory variable and the response variable. The proposed techniques can be further generalized into any feed-forward Artificial Neural Networks (ANNs) and Convolutional Neural Networks (CNNs), and any neural network. A real-world application on the Internet Movie Database (IMDB) is used and the proposed methods are applied to improve prediction performance with an 81% error reduction compared to other popular peers if I-score and “dagger technique” are not implemented.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords