网络与信息安全学报 (Apr 2023)
Survey on vertical federated learning: algorithm, privacy and security
Abstract
Federated learning (FL) is a distributed machine learning technology that enables joint construction of machine learning models by transmitting intermediate results (e.g., model parameters, parameter gradients, embedding representation, etc.) applied to data distributed across various institutions.FL reduces the risk of privacy leakage, since raw data is not allowed to leave the institution.According to the difference in data distribution between institutions, FL is usually divided into horizontal federated learning (HFL), vertical federated learning (VFL), and federal transfer learning (TFL).VFL is suitable for scenarios where institutions have the same sample space but different feature spaces and is widely used in fields such as medical diagnosis, financial and security of VFL.Although VFL performs well in real-world applications, it still faces many privacy and security challenges.To the best of our knowledge, no comprehensive survey has been conducted on privacy and security methods.The existing VFL was analyzed from four perspectives: the basic framework, communication mechanism, alignment mechanism, and label processing mechanism.Then the privacy and security risks faced by VFL and the related defense methods were introduced and analyzed.Additionally, the common data sets and indicators suitable for VFL and platform framework were presented.Considering the existing challenges and problems, the future direction and development trend of VFL were outlined, to provide a reference for the theoretical research of building an efficient, robust and safe VFL.