Lessons and tips for designing a machine learning study using EHR data

Jaron Arbet; Cole Brokamp; Jareen Meinzen-Derr; Katy E. Trinkley; Heidi M. Spratt

doi:10.1017/cts.2020.513

Journal of Clinical and Translational Science (Jan 2021)

Lessons and tips for designing a machine learning study using EHR data

Jaron Arbet,
Cole Brokamp,
Jareen Meinzen-Derr,
Katy E. Trinkley,
Heidi M. Spratt

Affiliations

Jaron Arbet: Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado-Denver Anschutz Medical Campus, Aurora, CO, USA
Cole Brokamp: Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
Jareen Meinzen-Derr: ORCiD; Department of Pediatrics, University of Cincinnati College of Medicine, Cincinnati, OH, USA Division of Biostatistics and Epidemiology, Cincinnati Children’s Hospital Medical Center, Cincinnati, OH, USA
Katy E. Trinkley: Department of Clinical Pharmacy, Skaggs School of Pharmacy and Pharmaceutical Sciences, University of Colorado, Aurora, CO, USA Department of Medicine, School of Medicine, University of Colorado, Aurora, CO, USA
Heidi M. Spratt: ORCiD; Department of Preventive Medicine and Population Health, University of Texas Medical Branch, Galveston, TX, USA

DOI: https://doi.org/10.1017/cts.2020.513
Journal volume & issue: Vol. 5

Abstract

Read online

Machine learning (ML) provides the ability to examine massive datasets and uncover patterns within data without relying on a priori assumptions such as specific variable associations, linearity in relationships, or prespecified statistical interactions. However, the application of ML to healthcare data has been met with mixed results, especially when using administrative datasets such as the electronic health record. The black box nature of many ML algorithms contributes to an erroneous assumption that these algorithms can overcome major data issues inherent in large administrative healthcare data. As with other research endeavors, good data and analytic design is crucial to ML-based studies. In this paper, we will provide an overview of common misconceptions for ML, the corresponding truths, and suggestions for incorporating these methods into healthcare research while maintaining a sound study design.

Published in Journal of Clinical and Translational Science

ISSN: 2059-8661 (Online)
Publisher: Cambridge University Press
Country of publisher: United Kingdom
LCC subjects: Medicine
Website: https://www.cambridge.org/core/journals/journal-of-clinical-and-translational-science

About the journal

Abstract

Keywords