Frontiers in Astronomy and Space Sciences (Jun 2023)

A selective up-sampling method applied upon unbalanced data for flare prediction: potential to improve model performance

  • Siwei Liu,
  • Siwei Liu,
  • Siwei Liu,
  • Jingjing Wang,
  • Jingjing Wang,
  • Ming Li,
  • Ming Li,
  • Ming Li,
  • Yanmei Cui,
  • Yanmei Cui,
  • Juan Guo,
  • Juan Guo,
  • Yurong Shi,
  • Yurong Shi,
  • Bingxian Luo,
  • Bingxian Luo,
  • Bingxian Luo,
  • Siqing Liu,
  • Siqing Liu,
  • Siqing Liu

DOI
https://doi.org/10.3389/fspas.2023.1082694
Journal volume & issue
Vol. 10

Abstract

Read online

The Spaceweather HMI Active Region Patch (SHARP) parameters have been widely used to develop flare prediction models. The relatively small number of strong-flare events leads to an unbalanced dataset that prediction models can be sensitive to the unbalanced data and might lead to bias and limited performance. In this study, we adopted the logistic regression algorithm to develop a flare prediction model for the next 48 h based on the SHARP parameters. The model was trained with five different inputs. The first input was the original unbalanced dataset; the second and third inputs were obtained by using two widely used sampling methods from the original dataset, while the fourth input was the original dataset but accompanied by a weighted classifier. Based on the distribution properties of strong-flare occurrences related to SHARP parameters, we established a new selective up-sampling method and applied it to the mixed-up region (referred to as the confusing distribution areas consisting of both the strong-flare events and non-strong-flare events) to pick up the flare-related samples and add small random values to them and finally create a large number of flare-related samples that are very close to the ground truth. Thus, we obtained the fifth balanced dataset aiming to 1) promote the forecast capability in the mixed-up region and 2) increase the robustness of the model. We compared the model performance and found that the selective up-sampling method has potential to improve the model performance in strong-flare prediction with its F1 score reaching 0.5501 ± 0.1200, which is approximately 22% − 33% higher than other imbalance mitigation schemes.

Keywords