Privacy-Preserving Machine Learning on Apache Spark

Claudia V. Brito; Pedro G. Ferreira; Bernardo L. Portela; Rui C. Oliveira; Joao T. Paulo

doi:10.1109/ACCESS.2023.3332222

IEEE Access (Jan 2023)

Privacy-Preserving Machine Learning on Apache Spark

Claudia V. Brito,
Pedro G. Ferreira,
Bernardo L. Portela,
Rui C. Oliveira,
Joao T. Paulo

Affiliations

Claudia V. Brito: ORCiD; INESC TEC, Porto, Portugal
Pedro G. Ferreira: ORCiD; INESC TEC, Porto, Portugal
Bernardo L. Portela: ORCiD; INESC TEC, Porto, Portugal
Rui C. Oliveira: INESC TEC, Porto, Portugal
Joao T. Paulo: ORCiD; INESC TEC, Porto, Portugal

DOI: https://doi.org/10.1109/ACCESS.2023.3332222
Journal volume & issue: Vol. 11
pp. 127907 – 127930

Abstract

Read online

The adoption of third-party machine learning (ML) cloud services is highly dependent on the security guarantees and the performance penalty they incur on workloads for model training and inference. This paper explores security/performance trade-offs for the distributed Apache Spark framework and its ML library. Concretely, we build upon a key insight: in specific deployment settings, one can reveal carefully chosen non-sensitive operations (e.g. statistical calculations). This allows us to considerably improve the performance of privacy-preserving solutions without exposing the protocol to pervasive ML attacks. In more detail, we propose Soteria, a system for distributed privacy-preserving ML that leverages Trusted Execution Environments (e.g. Intel SGX) to run computations over sensitive information in isolated containers (enclaves). Unlike previous work, where all ML-related computation is performed at trusted enclaves, we introduce a hybrid scheme, combining computation done inside and outside these enclaves. The experimental evaluation validates that our approach reduces the runtime of ML algorithms by up to 41% when compared to previous related work. Our protocol is accompanied by a security proof and a discussion regarding resilience against a wide spectrum of ML attacks.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords