TwIdw—A Novel Method for Feature Extraction from Unstructured Texts

Kitti Szabó Nagy; Jozef Kapusta

doi:10.3390/app13116438

Applied Sciences (May 2023)

TwIdw—A Novel Method for Feature Extraction from Unstructured Texts

Kitti Szabó Nagy,
Jozef Kapusta

Affiliations

Kitti Szabó Nagy: Department of Informatics, Faculty of Natural Sciences and Informatics, Constantine the Philosopher University in Nitra, 949 01 Nitra, Slovakia
Jozef Kapusta: Department of Informatics, Faculty of Natural Sciences and Informatics, Constantine the Philosopher University in Nitra, 949 01 Nitra, Slovakia

DOI: https://doi.org/10.3390/app13116438
Journal volume & issue: Vol. 13, no. 11
p. 6438

Abstract

Read online

This research proposes a novel technique for fake news classification using natural language processing (NLP) methods. The proposed technique, TwIdw (Term weight–inverse document weight), is used for feature extraction and is based on TfIdf, with the term frequencies replaced by the depth of the words in documents. The effectiveness of the TwIdw technique is compared to another feature extraction method—basic TfIdf. Classification models were created using the random forest and feedforward neural networks, and within those, three different datasets were used. The feedforward neural network method with the KaiDMML dataset showed an increase in accuracy of up to 3.9%. The random forest method with TwIdw was not as successful as the neural network method and only showed an increase in accuracy with the KaiDMML dataset (1%). The feedforward neural network, on the other hand, showed an increase in accuracy with the TwIdw technique for all datasets. Precision and recall measures also confirmed good results, particularly for the neural network method. The TwIdw technique has the potential to be used in various NLP applications, including fake news classification and other NLP classification problems.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords