IEEE Access (Jan 2023)
Enhancing Security and Privacy Preservation of Sensitive Information in e-Health Datasets Using FCA Approach
Abstract
Advances in data collection, storage, and processing in e-Health systems have recently increased the importance and popularity of data mining in the health care field. However, the high sensitivity of the handled and shared data, brings a high risk of information disclosure and exposure. It is therefore important to hide sensitive relationships by modifying the shared data. This major information security threat has, therefore, mandated the requirement of hiding/securing sensitive relationships of shared data. As a large number of data mining activities that attempt to identify interesting patterns from databases depend on locating frequent item sets, further investigation of frequent item sets requires privacy-preserving techniques. To solve many difficult combinatorial problems, such as data distribution problem, exact and heuristic algorithms have been used. Exact algorithms are studied and considered optimal for such problems, however they suffer scalability bottleneck, as they are limited to medium-sized instances only. Heuristic algorithms, on the other hand, are scalable, however, they perform poor on security and privacy preservation. This paper proposes a novel heuristic approach based on Formal Concept Analysis (FCA) for enhancing security and privacy preservation of sensitive e-Health information using itemset hiding techniques. Our approach, named FACHS (FCA Hiding Sensitive-itemsets) uses constraints to minimise side effects and asymmetry between the original database and the clean database (minimal distortion on the database). Moreover, our approach does not require frequent itemset extraction before the masking process. This gives the proposed approach an advantage in terms of total availability. We tested our FCAHS heuristic on various reference datasets. Extensive experimental results showed the effectiveness of the proposed masking approach and the time efficiency of itemset extraction, making it very promising for e-Health sensitive data security and privacy.
Keywords