网络与信息安全学报 (Jun 2024)

Survey of split learning data privacy

  • QIN Yiqun,
  • MA Xiaojing,
  • FU Jiayun,
  • HU Pingyi,
  • XU Peng,
  • JIN Hai

Journal volume & issue
Vol. 10
pp. 20 – 37

Abstract

Read online

With the rapid development of machine learning, artificial intelligence technology has been widely applied across various domains of life. However, concerns regarding the privacy risks associated with machine learning have increased. In response to these concerns, the Personal Information Protection Law of the People's Republic of China was promulgated to regulate the collection, use, and transmission of private information. Despite this, machine learning requires a large amount of data, necessitating the development of privacy protection technologies that allow for the collection and processing of data under legal and compliant conditions. Split learning, a privacy-preserving machine learning technique that enables the training of distributed models among multiple participants without sharing raw data, has emerged as a research focus. It has been recognized that split learning is vulnerable to data privacy attacks, and various attacks along with corresponding defenses have been proposed. However, existing surveys have not discussed and summarized research on data privacy during the training phase of split learning. The comprehensive overview of data privacy attack and defense techniques in the training phase of split learning was offered. Initially, the definition, principles, and classifications of split learning were summarized. Subsequently, two common attacks in split learning, namely the raw data reconstruction attack and the label leakage attack, were introduced. The causes of these attacks in the training phase of split learning were then analyzed, and corresponding defenses were presented. Finally, future research directions in the area of data privacy for split learning were discussed.

Keywords