Journal of Cheminformatics (Jul 2011)

Data governance in predictive toxicology: A review

  • Fu Xin,
  • Wojak Anna,
  • Neagu Daniel,
  • Ridley Mick,
  • Travis Kim

DOI
https://doi.org/10.1186/1758-2946-3-24
Journal volume & issue
Vol. 3, no. 1
p. 24

Abstract

Read online

Abstract Background Due to recent advances in data storage and sharing for further data processing in predictive toxicology, there is an increasing need for flexible data representations, secure and consistent data curation and automated data quality checking. Toxicity prediction involves multidisciplinary data. There are hundreds of collections of chemical, biological and toxicological data that are widely dispersed, mostly in the open literature, professional research bodies and commercial companies. In order to better manage and make full use of such large amount of toxicity data, there is a trend to develop functionalities aiming towards data governance in predictive toxicology to formalise a set of processes to guarantee high data quality and better data management. In this paper, data quality mainly refers in a data storage sense (e.g. accuracy, completeness and integrity) and not in a toxicological sense (e.g. the quality of experimental results). Results This paper reviews seven widely used predictive toxicology data sources and applications, with a particular focus on their data governance aspects, including: data accuracy, data completeness, data integrity, metadata and its management, data availability and data authorisation. This review reveals the current problems (e.g. lack of systematic and standard measures of data quality) and desirable needs (e.g. better management and further use of captured metadata and the development of flexible multi-level user access authorisation schemas) of predictive toxicology data sources development. The analytical results will help to address a significant gap in toxicology data quality assessment and lead to the development of novel frameworks for predictive toxicology data and model governance. Conclusions While the discussed public data sources are well developed, there nevertheless remain some gaps in the development of a data governance framework to support predictive toxicology. In this paper, data governance is identified as the new challenge in predictive toxicology, and a good use of it may provide a promising framework for developing high quality and easy accessible toxicity data repositories. This paper also identifies important research directions that require further investigation in this area.