A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data

Kazuki Fujiwara; Maiko Shigeno; Ushio Sumita

doi:10.1109/ACCESS.2019.2923524

IEEE Access (Jan 2019)

A New Approach for Developing Segmentation Algorithms for Strongly Imbalanced Data

Kazuki Fujiwara,
Maiko Shigeno,
Ushio Sumita

Affiliations

Kazuki Fujiwara: Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, Japan
Maiko Shigeno: ORCiD; Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, Japan
Ushio Sumita: Graduate School of Systems and Information Engineering, University of Tsukuba, Ibaraki, Japan

DOI: https://doi.org/10.1109/ACCESS.2019.2923524
Journal volume & issue: Vol. 7
pp. 82970 – 82977

Abstract

Read online

During the past two decades, the problem of how to develop efficient segmentation algorithms for dealing with strongly imbalanced data has been drawing much attention of researchers and practitioners in the field of data mining. A typical approach for this difficult problem is represented by a random under-sampling approach, where the cardinality of the majority set is reduced to that of the minority set through random sampling, thereby enabling one to utilize standard classifiers such as Logistic Regression, Support Vector Machine (SVM) and Random Forest. When the resulting segmentation algorithm is applied to a set of testing data with the original imbalanced-ness, however, its performance could be rather limited. So as to improve the performance, a bagged under-sampling (BUS) approach has been introduced where a random under-sampling is repeated M times, though the effect of BUS turns out to be still not quite satisfactory. The first purpose of this paper is to enhance the performance of BUS by developing a novel way where BUS is employed in a repetitive manner. While the performance improvement of this approach (R-BUS) over BUS is recognizable, it is still not sufficient enough from a practical point of view, especially when the dimension of underlying binary profile vectors is quite large. The second purpose of this paper is to establish a rank reduction (RR) approach for reducing this large dimension. The combined use of R-BUS with RR provides an excellent performance, as we will see through a real-world application of large magnitude.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords