Digital Health (May 2023)
Anonymization of whole slide images in histopathology for research and education
Abstract
Objective The exchange of health-related data is subject to regional laws and regulations, such as the General Data Protection Regulation (GDPR) in the EU or the Health Insurance Portability and Accountability Act (HIPAA) in the United States, resulting in non-trivial challenges for researchers and educators when working with these data. In pathology, the digitization of diagnostic tissue samples inevitably generates identifying data that can consist of sensitive but also acquisition-related information stored in vendor-specific file formats. Distribution and off-clinical use of these Whole Slide Images (WSIs) are usually done in these formats, as an industry-wide standardization such as DICOM is yet only tentatively adopted and slide scanner vendors currently do not provide anonymization functionality. Methods We developed a guideline for the proper handling of histopathological image data particularly for research and education with regard to the GDPR. In this context, we evaluated existing anonymization methods and examined proprietary format specifications to identify all sensitive information for the most common WSI formats. This work results in a software library that enables GDPR-compliant anonymization of WSIs while preserving the native formats. Results Based on the analysis of proprietary formats, all occurrences of sensitive information were identified for file formats frequently used in clinical routine, and finally, an open-source programming library with an executable CLI tool and wrappers for different programming languages was developed. Conclusions Our analysis showed that there is no straightforward software solution to anonymize WSIs in a GDPR-compliant way while maintaining the data format. We closed this gap with our extensible open-source library that works instantaneously and offline.