On the Privacy–Utility Trade-Off in Differentially Private Hierarchical Text Classification

Dominik Wunderlich; Daniel Bernau; Francesco Aldà; Javier Parra-Arnau; Thorsten Strufe

doi:10.3390/app122111177

Applied Sciences (Nov 2022)

On the Privacy–Utility Trade-Off in Differentially Private Hierarchical Text Classification

Dominik Wunderlich,
Daniel Bernau,
Francesco Aldà,
Javier Parra-Arnau,
Thorsten Strufe

Affiliations

Dominik Wunderlich: SAP SE, 76131 Karlsruhe, Germany
Daniel Bernau: SAP SE, 76131 Karlsruhe, Germany
Francesco Aldà: SAP SE, 76131 Karlsruhe, Germany
Javier Parra-Arnau: Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany
Thorsten Strufe: Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany

DOI: https://doi.org/10.3390/app122111177
Journal volume & issue: Vol. 12, no. 21
p. 11177

Abstract

Read online

Hierarchical text classification consists of classifying text documents into a hierarchy of classes and sub-classes. Although Artificial Neural Networks have proved useful to perform this task, unfortunately, they can leak training data information to adversaries due to training data memorization. Using differential privacy during model training can mitigate leakage attacks against trained models, enabling the models to be shared safely at the cost of reduced model accuracy. This work investigates the privacy–utility trade-off in hierarchical text classification with differential privacy guarantees, and it identifies neural network architectures that offer superior trade-offs. To this end, we use a white-box membership inference attack to empirically assess the information leakage of three widely used neural network architectures. We show that large differential privacy parameters already suffice to completely mitigate membership inference attacks, thus resulting only in a moderate decrease in model utility. More specifically, for large datasets with long texts, we observed Transformer-based models to achieve an overall favorable privacy–utility trade-off, while for smaller datasets with shorter texts, convolutional neural networks are preferable.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords