Data Poison Detection Schemes for Distributed Machine Learning

Yijin Chen; Yuming Mao; Haoyang Liang; Shui Yu; Yunkai Wei; Supeng Leng

doi:10.1109/ACCESS.2019.2962525

IEEE Access (Jan 2020)

Data Poison Detection Schemes for Distributed Machine Learning

Yijin Chen,
Yuming Mao,
Haoyang Liang,
Shui Yu,
Yunkai Wei,
Supeng Leng

Affiliations

Yijin Chen: ORCiD; School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
Yuming Mao: ORCiD; School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
Haoyang Liang: ORCiD; School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
Shui Yu: ORCiD; School of Software, University of Technology Sydney, Sydney, NSW, Australia
Yunkai Wei: ORCiD; School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China
Supeng Leng: ORCiD; School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu, China

DOI: https://doi.org/10.1109/ACCESS.2019.2962525
Journal volume & issue: Vol. 8
pp. 7442 – 7454

Abstract

Read online

Distributed machine learning (DML) can realize massive dataset training when no single node can work out the accurate results within an acceptable time. However, this will inevitably expose more potential targets to attackers compared with the non-distributed environment. In this paper, we classify DML into basic-DML and semi-DML. In basic-DML, the center server dispatches learning tasks to distributed machines and aggregates their learning results. While in semi-DML, the center server further devotes resources into dataset learning in addition to its duty in basic-DML. We firstly put forward a novel data poison detection scheme for basic-DML, which utilizes a cross-learning mechanism to find out the poisoned data. We prove that the proposed cross-learning mechanism would generate training loops, based on which a mathematical model is established to find the optimal number of training loops. Then, for semi-DML, we present an improved data poison detection scheme to provide better learning protection with the aid of the central resource. To efficiently utilize the system resources, an optimal resource allocation approach is developed. Simulation results show that the proposed scheme can significantly improve the accuracy of the final model by up to 20% for support vector machine and 60% for logistic regression in the basic-DML scenario. Moreover, in the semi-DML scenario, the improved data poison detection scheme with optimal resource allocation can decrease the wasted resources for 20-100%.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords