Revisiting Gradient Boosting-Based Approaches for Learning Imbalanced Data: A Case of Anomaly Detection on Power Grids

Maya Hilda Lestari Louk; Bayu Adhi Tama

doi:10.3390/bdcc6020041

Big Data and Cognitive Computing (Apr 2022)

Revisiting Gradient Boosting-Based Approaches for Learning Imbalanced Data: A Case of Anomaly Detection on Power Grids

Maya Hilda Lestari Louk,
Bayu Adhi Tama

Affiliations

Maya Hilda Lestari Louk: Department of Informatics Engineering, University of Surabaya, Surabaya 60293, Indonesia
Bayu Adhi Tama: Data Science Group, Institute for Basic Science (IBS), Daejeon 34141, Korea

DOI: https://doi.org/10.3390/bdcc6020041
Journal volume & issue: Vol. 6, no. 2
p. 41

Abstract

Read online

Gradient boosting ensembles have been used in the cyber-security area for many years; nonetheless, their efficacy and accuracy for intrusion detection systems (IDSs) remain questionable, particularly when dealing with problems involving imbalanced data. This article fills the void in the existing body of knowledge by evaluating the performance of gradient boosting-based ensembles, including gradient boosting machine (GBM), extreme gradient boosting (XGBoost), LightGBM, and CatBoost. This paper assesses the performance of various imbalanced data sets using the Matthew correlation coefficient (MCC), area under the receiver operating characteristic curve (AUC), and F1 metrics. The article discusses an example of anomaly detection in an industrial control network and, more specifically, threat detection in a cyber-physical smart power grid. The tests’ results indicate that CatBoost surpassed its competitors, regardless of the imbalance ratio of the data sets. Moreover, LightGBM showed a much lower performance value and had more variability across the data sets.

Published in Big Data and Cognitive Computing

ISSN: 2504-2289 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology
Website: http://www.mdpi.com/journal/BDCC

About the journal

Abstract

Keywords