JMIR Research Protocols (Jul 2024)
Combining Federated Machine Learning and Qualitative Methods to Investigate Novel Pediatric Asthma Subtypes: Protocol for a Mixed Methods Study
Abstract
BackgroundPediatric asthma is a heterogeneous disease; however, current characterizations of its subtypes are limited. Machine learning (ML) methods are well-suited for identifying subtypes. In particular, deep neural networks can learn patient representations by leveraging longitudinal information captured in electronic health records (EHRs) while considering future outcomes. However, the traditional approach for subtype analysis requires large amounts of EHR data, which may contain protected health information causing potential concerns regarding patient privacy. Federated learning is the key technology to address privacy concerns while preserving the accuracy and performance of ML algorithms. Federated learning could enable multisite development and implementation of ML algorithms to facilitate the translation of artificial intelligence into clinical practice. ObjectiveThe aim of this study is to develop a research protocol for implementation of federated ML across a large clinical research network to identify and discover pediatric asthma subtypes and their progression over time. MethodsThis mixed methods study uses data and clinicians from the OneFlorida+ clinical research network, which is a large regional network covering linked and longitudinal patient-level real-world data (RWD) of over 20 million patients from Florida, Georgia, and Alabama in the United States. To characterize the subtypes, we will use OneFlorida+ data from 2011 to 2023 and develop a research-grade pediatric asthma computable phenotype and clinical natural language processing pipeline to identify pediatric patients with asthma aged 2-18 years. We will then apply federated learning to characterize pediatric asthma subtypes and their temporal progression. Using the Promoting Action on Research Implementation in Health Services framework, we will conduct focus groups with practicing pediatric asthma clinicians within the OneFlorida+ network to investigate the clinical utility of the subtypes. With a user-centered design, we will create prototypes to visualize the subtypes in the EHR to best assist with the clinical management of children with asthma. ResultsOneFlorida+ data from 2011 to 2023 have been collected for 411,628 patients aged 2-18 years along with 11,156,148 clinical notes. We expect to complete the computable phenotyping within the first year of the project, followed by subtyping during the second and third years, and then will perform the focus groups and establish the user-centered design in the fourth and fifth years of the project. ConclusionsPediatric asthma subtypes incorporating RWD from diverse populations could improve patient outcomes by moving the field closer to precision pediatric asthma care. Our privacy-preserving federated learning methodology and qualitative implementation work will address several challenges of applying ML to large, multicenter RWD data. International Registered Report Identifier (IRRID)DERR1-10.2196/57981