Advanced Intelligent Systems (Jul 2023)

Power‐Law‐Based Synthetic Minority Oversampling Technique on Imbalanced Serum Surface‐Enhanced Raman Spectroscopy Data for Cancer Screening

  • Changbin Pan,
  • Kaiming Peng,
  • Tong Chen,
  • Guannan Chen,
  • Yuxiang Lin,
  • Qiyi Zhang,
  • Miaomiao Liu,
  • Duo Lin,
  • Tingyin Wang,
  • Shangyuan Feng

DOI
https://doi.org/10.1002/aisy.202300006
Journal volume & issue
Vol. 5, no. 7
pp. n/a – n/a

Abstract

Read online

Surface‐enhanced Raman spectroscopy (SERS) has shown highly promising for existing cancer screening. However, previous “proof‐of‐concept” studies ignored the natural imbalance of cancer types in the population, leading the model to be biased toward learning more features in majority class during the learning process at the expense of ignoring minority class. Herein, a power‐law‐based synthetic minority oversampling technique (PL‐SMOTE) method is proposed to guide the resampling of multiclass serum SERS data by analyzing the long‐tailed (power‐law) distribution of cancer prevalence in the population. The proposed PL‐SMOTE method balances the number of minorities to resample and the number of overlaps between classes by introducing modulating factor. Modeling on resampled datasets synthesized by PL‐SMOTE verifies the effectiveness of proposed PL‐SMOTE method. After further fine‐tuning, the parameters of the deep neural network model and PL‐SMOTE method, an optimal cancer screening model with an optimal macroaveraged Recall score of 97.24% and an optimal macroaveraged F2‐Score of 97.38% is obtained. A new method for multiclass imbalanced resampling is provided, which has significant improvement on model performance in terms of SERS cancer screening. The method also inspires in other multiclass imbalanced scenario, such as biological medicine, abnormal detection, and disaster prediction.

Keywords