Applied Sciences (Feb 2024)

Two-Stage Dimensionality Reduction for Social Media Engagement Classification

  • Jose Luis Vieira Sobrinho,
  • Flavio Henrique Teles Vieira,
  • Alisson Assis Cardoso

DOI
https://doi.org/10.3390/app14031269
Journal volume & issue
Vol. 14, no. 3
p. 1269

Abstract

Read online

The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the curse of dimensionality. Laying the paper’s foundation based on this premise, we propose a two-stage dimensionality reduction (TSDR) method for data classification. The first stage extracts high-quality features to a new subset by maximizing the pairwise separation probability, with the aim of avoiding overlap between individuals from different classes that are close to one another, also known as the class masking problem. The second stage takes the previous resulting subset and transforms it into a reduced final space in a way that maximizes the distance between the cluster centers of different classes while also minimizing the dispersion of instances within the same class. Hence, the second stage aims to improve the accuracy of the succeeding classifier by lowering its sensitivity to an imbalanced distribution of instances between different classes. Experiments on benchmark and social media datasets show how promising the proposed method is over some well-established algorithms, especially regarding social media engagement classification.

Keywords