Data Science Journal (Dec 2018)
Computing Statistics from Private Data
Abstract
In several domains, privacy presents a significant obstacle to scientific and analytic research, and limits the economic, social, health and scholastic benefits that could be derived from such research. These concerns stem from the need for privacy about personally identifiable information (PII), commercial intellectual property, and other types of information. For example, businesses, researchers, and policymakers may benefit by analyzing aggregate information about markets, but individual companies may not be willing to reveal information about risks, strategies, and weaknesses that could be exploited by competitors. Extracting valuable utility from the new “big data” economy demands new privacy technologies to overcome barriers that impede sensitive data from being aggregated and analyzed. Secure multiparty computation (MPC) is a collection of cryptographic technologies that can be used to effectively cope with some of these obstacles, and provide a new means of allowing researchers to coordinate and analyze sensitive data collections, obviating the need for data-owners to share the underlying data sets with other researchers or with each other. This paper outlines the findings that were made during interdisciplinary workshops that examined potential applications of MPC to data in the social and health sciences. The primary goals of this work are to describe the computational needs of these disciplines and to develop a specific roadmap for selecting efficient algorithms and protocols that can be used as a starting point for interdisciplinary projects between cryptographers and data scientists.
Keywords