Informatics in Medicine Unlocked (Jan 2023)

Curator – A data curation tool for clinical real-world evidence

  • Antonella Delmestri,
  • Daniel Prieto-Alhambra

Journal volume & issue
Vol. 40
p. 101291

Abstract

Read online

Objective: This research aims to establish an efficient, systematic, reproducible, and transparent solution for advanced curation of real-world data, which are highly complex and represent an invaluable source of information for academia and industry. Materials and methods: We propose a novel software solution that splits the statistical analytical pipeline into two phases. The first phase is implemented through Curator, which performs data engineering and data modelling on deidentified real-world data to achieve advanced curation and provides selected information ready to be analyzed in the second phase by statistical packages. Curator is made of a suite of Python programs and uses MySQL as its database management system. Curator has been utilised with several UK primary and secondary care data sources. Results: Curator has been used in 25 completed clinical and health economics research studies. Their output has been published in 2 NIHR-funded reports and 33 prestigious international peer-reviewed journals and presented at 38 global conferences. Curator has consistently reduced research time and costs by over 36% and made research more reproducible and transparent. Discussion: Curator fits in well with recent UK governmental guidelines that recognise health data curation as a complex standalone technical challenge. Curator has been used extensively on UK real-world data and can handle several linked datasets. However, for Curator to be accessed by a wider audience, it needs to become more user-friendly. Conclusion: Curator has proven to be a cost-effective and trustworthy data curation tool, which should be developed further and made available to third parties.

Keywords