International Journal of Population Data Science (Feb 2020)
Empowering knowledge generation through international data network: the IMeCCHI-DATANETWORK
Abstract
Introduction The International Methodology Consortium for Coded Health Information (IMeCCHI) is a collaboration of health services researchers who promote methodological advances in coded health information. The IMeCCHI-DATANETWORK initiative focuses on developing a multi-purpose distributed data infrastructure and common data model (CDM) to enable cross-border data sharing and international comparisons. Methods IMeCCHI consortium partners from six different countries – Canada, Denmark, Italy, New Zealand, South Korea, and Switzerland – used a questionnaire to describe their original databases which differ in size, structure, content and coding systems. To standardize these data, they agreed on a CDM and mapped their population-based databases to meet the CDM specifications. At the end of this process, local data had a more homogenous content and structure, which made them syntactically and semantically interoperable. Data transformation was performed using a common data management software called TheMatrix. Results The CDM encompasses four tables of structured data (person characteristics, hospitalizations, outpatient prescription medication and death), linked at the individual level through a person identifier. It can be used to answer research questions across countries using locally converted databases, which facilitates study replication in a distributed fashion. As a proof-of-concept study, an initial research question was addressed using an agreed protocol. Local data were transformed in csv files in the CDM structure and TheMatrix was tested to transform the standardized data from each partner into local analytical datasets. This allowed results to be shared between countries, whilst maintaining local control over each region’s data. Conclusion The IMeCCHI-DATANETWORK, a model of a distributed data network, demonstrated that it is feasible to analyze international data using standardized analytical methods that enable independent analyses by regions, without relocating datasets thereby protecting local confidentiality obligations. The distributed data infrastructure can produce results that can be generalized to several countries, while facilitating cross-border data sharing and international comparisons.
Keywords