Mathematics (May 2023)

FedISM: Enhancing Data Imbalance via Shared Model in Federated Learning

  • Wu-Chun Chung,
  • Yan-Hui Lin,
  • Sih-Han Fang

DOI
https://doi.org/10.3390/math11102385
Journal volume & issue
Vol. 11, no. 10
p. 2385

Abstract

Read online

Considering the sensitivity of data in medical scenarios, federated learning (FL) is suitable for applications that require data privacy. Medical personnel can use the FL framework for machine learning to assist in analyzing large-scale data that are protected within the institution. However, not all clients have the same distribution of datasets, so data imbalance problems occur among clients. The main challenge is to overcome the performance degradation caused by low accuracy and the inability to converge the model. This paper proposes a FedISM method to enhance performance in the case of Non-Independent Identically Distribution (Non-IID). FedISM exploits a shared model trained on a candidate dataset before performing FL among clients. The Candidate Selection Mechanism (CSM) was proposed to effectively select the most suitable candidate among clients for training the shared model. Based on the proposed approaches, FedISM not only trains the shared model without sharing any raw data, but it also provides an optimal solution through the selection of the best shared model. To evaluate performance, the proposed FedISM was applied to classify coronavirus disease (COVID), pneumonia, normal, and viral pneumonia in the experiments. The Dirichlet process was also used to simulate a variety of imbalanced data distributions. Experimental results show that FedISM improves accuracy by up to 25% when privacy concerns regarding patient data are rising among medical institutions.

Keywords