Entropy (Apr 2023)

On the Asymptotic Capacity of Information-Theoretic Privacy-Preserving Epidemiological Data Collection

  • Jiale Cheng,
  • Nan Liu,
  • Wei Kang

DOI
https://doi.org/10.3390/e25040625
Journal volume & issue
Vol. 25, no. 4
p. 625

Abstract

Read online

The paradigm-shifting developments of cryptography and information theory have focused on the privacy of data-sharing systems, such as epidemiological studies, where agencies are collecting far more personal data than they need, causing intrusions on patients’ privacy. To study the capability of the data collection while protecting privacy from an information theory perspective, we formulate a new distributed multiparty computation problem called privacy-preserving epidemiological data collection. In our setting, a data collector requires a linear combination of K users’ data through a storage system consisting of N servers. Privacy needs to be protected when the users, servers, and data collector do not trust each other. For the users, any data are required to be protected from up to E colluding servers; for the servers, any more information than the desired linear combination cannot be leaked to the data collector; and for the data collector, any single server can not know anything about the coefficients of the linear combination. Our goal is to find the optimal collection rate, which is defined as the ratio of the size of the user’s message to the total size of downloads from N servers to the data collector. For achievability, we propose an asymptotic capacity-achieving scheme when EN−1, by applying the cross-subspace alignment method to our construction; for the converse, we proved an upper bound of the asymptotic rate for all achievable schemes when EN−1. Additionally, we show that a positive asymptotic capacity is not possible when E≥N−1. The results of the achievability and converse meet when the number of users goes to infinity, yielding the asymptotic capacity. Our work broadens current researches on data privacy in information theory and gives the best achievable asymptotic performance that any epidemiological data collector can obtain.

Keywords