Feature Distribution-Based Medical Data Augmentation: Enhancing Mood Disorder Classification

Joo Hun Yoo; Ji Hyun An; Tai-Myoung Chung

doi:10.1109/ACCESS.2024.3396138

IEEE Access (Jan 2024)

Feature Distribution-Based Medical Data Augmentation: Enhancing Mood Disorder Classification

Joo Hun Yoo,
Ji Hyun An,
Tai-Myoung Chung

Affiliations

Joo Hun Yoo: ORCiD; Department of Artificial Intelligence, Sungkyunkwan University, Suwon-si, Republic of Korea
Ji Hyun An: Department of Psychiatry, Sungkyunkwan University School of Medicine, Seoul, Republic of Korea
Tai-Myoung Chung: ORCiD; Department of Computer Science and Engineering, Sungkyunkwan University, Suwon-si, Republic of Korea

DOI: https://doi.org/10.1109/ACCESS.2024.3396138
Journal volume & issue: Vol. 12
pp. 127782 – 127791

Abstract

Read online

Classification models using deep or machine learning algorithms require a sufficient and balanced training dataset to improve performance. Still, they suffer from data collection due to data privacy issues. In medical research, where most data variables are sensitive information, collecting enough training data for model performance improvement is more challenging. This study presents a new medical data augmentation algorithm consisting of four steps to solve the data shortage and class imbalance issues. The main idea of the proposed algorithm is to reflect the core characteristic of the original data’s class label. The algorithm receives an original dataset as an input value to extract the feature vector and trains the individual autoencoder model. Then it verifies the augmented feature vector through a distributional equality check, and each feature vector is concatenated into one feature vector. The deep learning model inference is applied on a concatenated vector for the second verification, to finalize the augmented training dataset. Our team performed mood disorder classification using patient data to prove the presented data augmentation algorithm. With the method, the classification performance improved by 0.059 in the severity classification of major depressive disorder, 0.041 in the severity classification of anxiety disorder, and 0.073 in the subtype classification of bipolar disorder. Through this study, we proved that our algorithm can be applied to minimize model bias and improve classification performance on the medical data that are unbalanced or insufficient in number by class.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords