Efficient natural language classification algorithm for detecting duplicate unsupervised features

Saud Altaf; Sofia Iqbal; Muhammad Waseem Soomro

doi:10.15622/ia.2021.3.5

Информатика и автоматизация (Jun 2021)

Efficient natural language classification algorithm for detecting duplicate unsupervised features

Saud Altaf,
Sofia Iqbal,
Muhammad Waseem Soomro

Affiliations

Saud Altaf
Sofia Iqbal
Muhammad Waseem Soomro

DOI: https://doi.org/10.15622/ia.2021.3.5
Journal volume & issue: Vol. 20, no. 3
pp. 623 – 653

Abstract

Read online

This paper focuses on capturing the meaning of Natural Language Understanding (NLU) text features to detect the duplicate unsupervised features. The NLU features are compared with lexical approaches to prove the suitable classification technique. The transfer-learning approach is utilized to train the extraction of features on the Semantic Textual Similarity (STS) task. All features are evaluated with two types of datasets that belong to Bosch bug and Wikipedia article reports. This study aims to structure the recent research efforts by comparing NLU concepts for featuring semantics of text and applying it to IR. The main contribution of this paper is a comparative study of semantic similarity measurements. The experimental results demonstrate the Term Frequency–Inverse Document Frequency (TF-IDF) feature results on both datasets with reasonable vocabulary size. It indicates that the Bidirectional Long Short Term Memory (BiLSTM) can learn the structure of a sentence to improve the classification.

Published in Информатика и автоматизация

ISSN: 2713-3192 (Print); 2713-3206 (Online)
Publisher: Russian Academy of Sciences, St. Petersburg Federal Research Center
Country of publisher: Russian Federation
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://ia.spcras.ru/index.php/sp/index

About the journal

Abstract

Keywords