Scientific African (Sep 2023)
Data lake governance using IBM-Watson knowledge catalog
Abstract
The strategic importance of data in decision-making is increasingly recognized, demanding efficient solutions such as data catalogs to ensure data governance and emphasize data interoperability, in accordance with the FAIR (Findable, Accessible, Interoperable, and Reusable) principles. However, the usage of FAIR-compliant data catalogs lacks empirical studies due to its novelty. This study aims to promote the practical adoption of data catalogs as a means to manage the expanding data landscape. We differentiate our contribution by providing an empirical evaluation and comparison of IBM Watson Knowledge Catalog (IBM-WKC), a leading data cataloging solution, with two other prominent alternatives, Open-Metadata and Data-Galaxy, for extracting relevant information from data lakes containing heterogeneous data sources in their native formats. Our proposed methodology utilizes an innovative tool built on IBM-WKC for annotating collected documents. To evaluate our approach, we conducted experiments on a dataset of 100 documents sourced from scientific databases. Moreover, to assess our proposal, we compare the retrieved text to the appropriate interventions that use the original checklist. The results demonstrate the superiority of IBM-WKC over its competitors, showcasing its enhanced performance in addressing data cataloging challenges. Notably, the tested queries achieved an impressive accuracy, precision, and recall value of 96%. These findings highlight the reliability and alignment of IBM-WKC with the FAIR principles.