Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems

Julio Cesar Munguía Mondragón; Eréndira Rendón Lara; Roberto Alejo Eleuterio; Everardo Efrén Granda Gutirrez; Federico Del Razo López

doi:10.3390/math11184008

Mathematics (Sep 2023)

Density-Based Clustering to Deal with Highly Imbalanced Data in Multi-Class Problems

Julio Cesar Munguía Mondragón,
Eréndira Rendón Lara,
Roberto Alejo Eleuterio,
Everardo Efrén Granda Gutirrez,
Federico Del Razo López

Affiliations

Julio Cesar Munguía Mondragón: Division of Postgraduate Studies and Research, National Technological of Mexico (TecNM), Instituto Tecnológico de Toluca, Metepec 52149, Estado de Mexico, Mexico
Eréndira Rendón Lara: Division of Postgraduate Studies and Research, National Technological of Mexico (TecNM), Instituto Tecnológico de Toluca, Metepec 52149, Estado de Mexico, Mexico
Roberto Alejo Eleuterio: Division of Postgraduate Studies and Research, National Technological of Mexico (TecNM), Instituto Tecnológico de Toluca, Metepec 52149, Estado de Mexico, Mexico
Everardo Efrén Granda Gutirrez: University Center at Atlacomulco, Autonomous University of the State of Mexico (UAEMex), Atlacomulco 50400, Estado de Mexico, Mexico
Federico Del Razo López: Division of Postgraduate Studies and Research, National Technological of Mexico (TecNM), Instituto Tecnológico de Toluca, Metepec 52149, Estado de Mexico, Mexico

DOI: https://doi.org/10.3390/math11184008
Journal volume & issue: Vol. 11, no. 18
p. 4008

Abstract

Read online

In machine learning and data mining applications, an imbalanced distribution of classes in the training dataset can drastically affect the performance of learning models. The class imbalance problem is frequently observed during classification tasks in real-world scenarios when the available instances of one class are much fewer than the amount of data available in other classes. Machine learning algorithms that do not consider the class imbalance could introduce a strong bias towards the majority class, while the minority class is usually despised. Thus, sampling techniques have been extensively used in various studies to overcome class imbalances, mainly based on random undersampling and oversampling methods. However, there is still no final solution, especially in the domain of multi-class problems. A strategy that combines density-based clustering algorithms with random undersampling and oversampling techniques is studied in this work. To analyze the performance of the studied method, an experimental validation was achieved on a collection of hyperspectral remote sensing images, and a deep learning neural network was utilized as the classifier. This data bank contains six datasets with different imbalance ratios, from slight to severe. The experimental results outperform the classification measured by the geometric mean of the precision compared with other state-of-the-art methods, mainly for highly imbalanced datasets.

Published in Mathematics

ISSN: 2227-7390 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Mathematics
Website: http://www.mdpi.com/journal/mathematics

About the journal

Abstract

Keywords