Jisuanji kexue (Jan 2022)

Imbalanced Data Classification:A Survey and Experiments in Medical Domain

  • JIANG Hao-chen, WEI Zi-qi, LIU Lin, CHEN Jun

DOI
https://doi.org/10.11896/jsjkx.210200124
Journal volume & issue
Vol. 49, no. 1
pp. 80 – 88

Abstract

Read online

In recent years,AI technology has been widely adopted in many application domains,amongst which,intelligent medical applications such as clinical decision support systems have attracted much attention.However,since the current wave of AI applications are based on predictive models crystalized from historical data,the feature and quality of data will affect AI applications' performance directly.Medical data are inherently imbalanced as rare disease cases are always the scarce in existing case archives,while considered more important.The "data imbalance problem" is still considered a difficult research problem in machine lear-ning.This paper conducts a literature review on the research efforts targeting at techniques to handle "imbalanced data" in gene-ral as well as the ones in intelligent medical area.We then use research publications from the SIGKDD conference dedicated to knowledge discovery and data mining as a sample pool,to find people's preferred approach to address "imbalanced data" problem in a given domain.Finally,based on approaches,we identify from the survey,and conduct experiments on two typical medical predictive model learning scenarios,to validate the know-how we acquired in this study.

Keywords