Informatics in Medicine Unlocked (Jan 2021)
A methodology for cohort harmonisation in multicentre clinical research
Abstract
Many clinical trials and scientific studies have been conducted aiming for better understanding of specific medical conditions. However, these studies are often based on a small number of participants due to the difficulty in finding people with similar medical characteristics and available to participate in the studies. This is particularly critical in rare diseases, where the reduced number of subjects hinders reliable findings. To generate more substantial clinical evidence by increasing the power of the analyses, researchers have started to perform data harmonisation and multiple cohort analyses. However, the analysis of heterogeneous data sources implies dealing with different data structures, terminologies, concepts, languages and, most importantly, the knowledge behind the data.In this paper, we present a methodology to harmonise different cohorts into a standard data schema, helping the research community to generate evidence from a wider variety of data sources. Our methodology was inspired by the OHDSI Common Data Model, which aims to harmonise EHR datasets for observational studies, leveraging on knowledge and open source tools to perform multicentric disease-specific studies. This proposal was validated using Alzheimer’s Disease cohorts from several countries, combining at the end 6,669 subjects and 172 clinical concepts. The harmonised datasets now enable multi-cohort querying and analysis, helping in the execution of new research. The methodology was implemented in Python language and is available, under the MIT licence, at https://bioinformatics-ua.github.io/CMToolkit/.