BMJ Public Health (Mar 2024)
Estimating rare disease prevalence and costs in the USA: a cohort study approach using the Healthcare Cost Institute claims data
Abstract
Objective The study capitalised on national insurance claims data to gather information on patient characteristics and associated costs to better understand the diagnosis and treatment of rare diseases (RDs).Materials and methods Data from the Healthcare Cost Institute (HCCI) data enclave were analysed using R statistical software and filtered by the International Classification of Diseases, 10th edition (ICD-10), current procedural terminology codes and the National Drug Code associated with 14 RDs and disease-modifying therapy options. Data were aggregated by prevalence, costs, patient characteristics and effects of treatment modification.Results The prevalence and costs of RDs in the HCCI commercial claims database varied significantly across the USA and between urban and rural areas. Pharmacy costs increased when a new treatment was initiated, while non-pharmacy costs decreased.Discussion Prevalence and cost estimations are highly variable due to the small number of patients with RDs, and the lack of a national healthcare database limits inferences for such patient populations. Accurate assessments require a diverse population, which can likely be achieved by analysing multiple databases. RDs face challenges in prevalence estimation due to a lack of specific disease coding and a small patient population, compounded by issues like data standardisation and privacy concerns. Addressing these through improved data management in healthcare systems, increased research and education will lead to better diagnosis, care management and quality of life for patients with RD.Conclusion Data on patients with RD in the HCCI database were analysed for prevalence, costs, patient characteristics and treatment modification effects. Significant heterogeneity in each of these factors was found across RDs, geography and locality (eg, urban and rural). Building capabilities to use machine learning to accelerate the diagnosis of RDs would vastly improve with changes to healthcare data, such as standardising data input, linking databases, addressing privacy issues and assigning ICD-10 codes for all RDs, resulting in more robust data for RD analytics.