IEEE Access (Jan 2024)

Feature Distribution-Based Medical Data Augmentation: Enhancing Mood Disorder Classification

  • Joo Hun Yoo,
  • Ji Hyun An,
  • Tai-Myoung Chung

DOI
https://doi.org/10.1109/ACCESS.2024.3396138
Journal volume & issue
Vol. 12
pp. 127782 – 127791

Abstract

Read online

Classification models using deep or machine learning algorithms require a sufficient and balanced training dataset to improve performance. Still, they suffer from data collection due to data privacy issues. In medical research, where most data variables are sensitive information, collecting enough training data for model performance improvement is more challenging. This study presents a new medical data augmentation algorithm consisting of four steps to solve the data shortage and class imbalance issues. The main idea of the proposed algorithm is to reflect the core characteristic of the original data’s class label. The algorithm receives an original dataset as an input value to extract the feature vector and trains the individual autoencoder model. Then it verifies the augmented feature vector through a distributional equality check, and each feature vector is concatenated into one feature vector. The deep learning model inference is applied on a concatenated vector for the second verification, to finalize the augmented training dataset. Our team performed mood disorder classification using patient data to prove the presented data augmentation algorithm. With the method, the classification performance improved by 0.059 in the severity classification of major depressive disorder, 0.041 in the severity classification of anxiety disorder, and 0.073 in the subtype classification of bipolar disorder. Through this study, we proved that our algorithm can be applied to minimize model bias and improve classification performance on the medical data that are unbalanced or insufficient in number by class.

Keywords