Computer Methods and Programs in Biomedicine Update (Jan 2023)
Development of an OpenMRS-OMOP ETL tool to support informatics research and collaboration in LMICs
Abstract
Background: As more low and middle-income countries (LMICs) implement electronic health record systems (EHRs), informatics has become an important component of global health. OpenMRS is a popular open-source EHR that has been implemented in over 60 countries. As in high income countries, interoperability and research capabilities remain a challenge. The Observational Medical Outcomes Partnership (OMOP) is one of the most relevant common data models (CDM) to support EHR-based research and data sharing, but its adoption has been limited in LMICs. To address this gap, we developed an OpenMRS to OMOP extract, transform, and load (ETL) tool using Talend. Methods: We built on existing documentation to develop a comprehensive concept map from OpenMRS to OMOP. The OMOP domains were reviewed for overlapping concepts in OpenMRS, and a core set of tables were selected for ETL development. Specific variables were then identified from OpenMRS tables which mapped to OMOP domain fields. Afterwards, the ETL tool was developed using MySQL Workbench, PostgreSQL, and Talend. Results: Seven of 14 OMOP domains were selected for ETL pipeline development . The location, person, and provider domains required the least amount of Talend job components, which involved ≤2 tDBInputs, 1 tMap, and 1 tDBOutput. Care_site, observation_period, observation, and person_death all required additional Talend components to properly transform the respective data fields. It took 15 min to transform 9,932 OpenMRS observation records to OMOP. Conclusions: It is feasible to develop a free, open-source ETL pipeline to transform clinical data in OpenMRS instances into OMOP. Processing large datasets is swift and scalable with potential for more improvement. Using this tool alongside OpenMRS can dramatically increase the potential for global health informatics collaborations and building local infrastructure and research capacity. Further testing and development will be required prior to widespread dissemination, along with appropriate documentation and training resources.