npj Digital Medicine (May 2023)
Ontologizing health systems data at scale: making translational discovery a reality
- Tiffany J. Callahan,
- Adrianne L. Stefanski,
- Jordan M. Wyrwa,
- Chenjie Zeng,
- Anna Ostropolets,
- Juan M. Banda,
- William A. Baumgartner,
- Richard D. Boyce,
- Elena Casiraghi,
- Ben D. Coleman,
- Janine H. Collins,
- Sara J. Deakyne Davies,
- James A. Feinstein,
- Asiyah Y. Lin,
- Blake Martin,
- Nicolas A. Matentzoglu,
- Daniella Meeker,
- Justin Reese,
- Jessica Sinclair,
- Sanya B. Taneja,
- Katy E. Trinkley,
- Nicole A. Vasilevsky,
- Andrew E. Williams,
- Xingmin A. Zhang,
- Joshua C. Denny,
- Patrick B. Ryan,
- George Hripcsak,
- Tellen D. Bennett,
- Melissa A. Haendel,
- Peter N. Robinson,
- Lawrence E. Hunter,
- Michael G. Kahn
Affiliations
- Tiffany J. Callahan
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus
- Adrianne L. Stefanski
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus
- Jordan M. Wyrwa
- Department of Physical Medicine and Rehabilitation, School of Medicine, University of Colorado Anschutz Medical Campus
- Chenjie Zeng
- National Human Genome Research Institute, National Institutes of Health
- Anna Ostropolets
- Department of Biomedical Informatics, Columbia University Irving Medical Center
- Juan M. Banda
- Department of Computer Science, Georgia State University
- William A. Baumgartner
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus
- Richard D. Boyce
- Department of Biomedical Informatics, University of Pittsburgh School of Medicine
- Elena Casiraghi
- Computer Science, Università degli Studi di Milano
- Ben D. Coleman
- The Jackson Laboratory for Genomic Medicine
- Janine H. Collins
- Department of Haematology, University of Cambridge
- Sara J. Deakyne Davies
- Department of Research Informatics & Data Science, Analytics Resource Center, Children’s Hospital Colorado
- James A. Feinstein
- Adult and Child Center for Health Outcomes Research and Delivery Science (ACCORDS), University of Colorado Anschutz School of Medicine
- Asiyah Y. Lin
- National Human Genome Research Institute, National Institutes of Health
- Blake Martin
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine
- Nicolas A. Matentzoglu
- Semanticly
- Daniella Meeker
- Yale School of Medicine
- Justin Reese
- Division of Environmental Genomics and Systems Biology, Lawrence Berkeley National Laboratory
- Jessica Sinclair
- HealthLinc
- Sanya B. Taneja
- Intelligent Systems Program, University of Pittsburgh
- Katy E. Trinkley
- Department of Family Medicine, University of Colorado Anschutz School of Medicine
- Nicole A. Vasilevsky
- Translational and Integrative Sciences Lab, University of Colorado Anschutz Medical Campus
- Andrew E. Williams
- Tufts Institute for Clinical Research and Health Policy Studies, Tufts University
- Xingmin A. Zhang
- The Jackson Laboratory for Genomic Medicine
- Joshua C. Denny
- National Human Genome Research Institute, National Institutes of Health
- Patrick B. Ryan
- Janssen Research and Development
- George Hripcsak
- Department of Biomedical Informatics, Columbia University Irving Medical Center
- Tellen D. Bennett
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine
- Melissa A. Haendel
- Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine
- Peter N. Robinson
- The Jackson Laboratory for Genomic Medicine
- Lawrence E. Hunter
- Computational Bioscience Program, University of Colorado Anschutz Medical Campus
- Michael G. Kahn
- Department of Biomedical Informatics, University of Colorado School of Medicine
- DOI
- https://doi.org/10.1038/s41746-023-00830-x
- Journal volume & issue
-
Vol. 6,
no. 1
pp. 1 – 18
Abstract
Abstract Common data models solve many challenges of standardizing electronic health record (EHR) data but are unable to semantically integrate all of the resources needed for deep phenotyping. Open Biological and Biomedical Ontology (OBO) Foundry ontologies provide computable representations of biological knowledge and enable the integration of heterogeneous data. However, mapping EHR data to OBO ontologies requires significant manual curation and domain expertise. We introduce OMOP2OBO, an algorithm for mapping Observational Medical Outcomes Partnership (OMOP) vocabularies to OBO ontologies. Using OMOP2OBO, we produced mappings for 92,367 conditions, 8611 drug ingredients, and 10,673 measurement results, which covered 68–99% of concepts used in clinical practice when examined across 24 hospitals. When used to phenotype rare disease patients, the mappings helped systematically identify undiagnosed patients who might benefit from genetic testing. By aligning OMOP vocabularies to OBO ontologies our algorithm presents new opportunities to advance EHR-based deep phenotyping.