Journal of Bio-X Research (Sep 2021)

A differential privacy protection query language for medical data: a proof-of-concept system validation

  • Huanhuan Wang,
  • Yongting Zhang,
  • Hongsheng Yin,
  • Ruirui Li,
  • Xiang Wu

DOI
https://doi.org/10.1097/JBR.0000000000000099
Journal volume & issue
Vol. 4, no. 3
pp. 103 – 113

Abstract

Read online

Abstract. Objective:. Medical data mining and sharing is an important process in E-Health applications. However, because these data consist of a large amount of personal private information of patients, there is the risk of privacy disclosure when sharing and mining. Therefore, ensuring the security of medical big data in the process of publishing, sharing, and mining has become the focus of current research. The objective of our study is to design a framework based on a differential privacy protection mechanism to ensure the secure sharing of medical data. We developed a privacy protection query language (PQL) that integrates multiple data mining methods and provides a secure sharing function. Methods:. This study is mainly performed in Xuzhou Medical University, China and designs three sub-modules: a parsing module, mining module, and noising module. Each module encapsulates different computing methods, such as a composite parser and a noise theory. In the PQL framework, we apply the differential privacy theory to the results of the computing between modules to guarantee the security of various mining algorithms. These computing devices operate independently, but the mining results depend on their cooperation. In addition, PQL is encapsulated in MNSSp3 that is a data mining and security sharing platform and the data comes from public data sets, such as UCBI. The public data set (NCBI database) was used as the experimental data, and the data collection time was January 2020. Results:. We designed and developed a query language that provides functions for medical data mining, sharing, and privacy preservation. We theoretically proved the performance of the PQL framework. The experimental results show that the PQL framework can ensure the security of each mining result and the availability of the output results is above 97%. Conclusion:. Our framework enables medical data providers to securely share health data or treatment data and develops a usable query language, based on a differential privacy mechanism, that enables researchers to mine information securely using data mining algorithms.