Big Data and Cognitive Computing (Oct 2023)
An Empirical Study on Core Data Asset Identification in Data Governance
Abstract
Data governance aims to optimize the value derived from data assets and effectively mitigate data-related risks. The rapid growth of data assets increases the risk of data breaches. One key solution to reduce this risk is to classify data assets according to their business value and criticality to the enterprises, allocating limited resources to protect core data assets. The existing methods rely on the experience of professionals and cannot identify core data assets across business scenarios. This work conducts an empirical study to address this issue. First, we utilized data lineage graphs with expert-labeled core data assets to investigate the experience of data users on core data asset identification from a scenario perspective. Then, we explored the structural features of core data assets on data lineage graphs from an abstraction perspective. Finally, one expert seminar was conducted to derive a set of universal indicators to identify core data assets by synthesizing the results from the two perspectives. User and field studies were conducted to demonstrate the effectiveness of the indicators.
Keywords