Journal of Medical Internet Research (May 2023)
Development of Indirect Health Data Linkage on Health Product Use and Care Trajectories in France: Systematic Review
Abstract
BackgroundEuropean national disparities in the integration of data linkage (ie, being able to match patient data between databases) into routine public health activities were recently highlighted. In France, the claims database covers almost the whole population from birth to death, offering a great research potential for data linkage. As the use of a common unique identifier to directly link personal data is often limited, linkage with a set of indirect key identifiers has been developed, which is associated with the linkage quality challenge to minimize errors in linked data. ObjectiveThe aim of this systematic review is to analyze the type and quality of research publications on indirect data linkage on health product use and care trajectories in France. MethodsA comprehensive search for all papers published in PubMed/Medline and Embase databases up to December 31, 2022, involving linked French database focusing on health products use or care trajectories was realized. Only studies based on the use of indirect identifiers were included (ie, without a unique personal identifier available to easily link the databases). A descriptive analysis of data linkage with quality indicators and adherence to the Bohensky framework for evaluating data linkage studies was also realized. ResultsIn total, 16 papers were selected. Data linkage was performed at the national level in 7 (43.8%) cases or at the local level in 9 (56.2%) studies. The number of patients included in the different databases and resulting from data linkage varied greatly, respectively, from 713 to 75,000 patients and from 210 to 31,000 linked patients. The diseases studied were mainly chronic diseases and infections. The objectives of the data linkage were multiple: to estimate the risk of adverse drug reactions (ADRs; n=6, 37.5%), to reconstruct the patient’s care trajectory (n=5, 31.3%), to describe therapeutic uses (n=2, 12.5%), to evaluate the benefits of treatments (n=2, 12.5%), and to evaluate treatment adherence (n=1, 6.3%). Registries are the most frequently linked databases with French claims data. No studies have looked at linking with a hospital data warehouse, a clinical trial database, or patient self-reported databases. The linkage approach was deterministic in 7 (43.8%) studies, probabilistic in 4 (25.0%) studies, and not specified in 5 (31.3%) studies. The linkage rate was mainly from 80% to 90% (reported in 11/15, 73.3%, studies). Adherence to the Bohensky framework for evaluating data linkage studies showed that the description of the source databases for the linkage was always performed but that the completion rate and accuracy of the variables to be linked were not systematically described. ConclusionsThis review highlights the growing interest in health data linkage in France. Nevertheless, regulatory, technical, and human constraints remain major obstacles to their deployment. The volume, variety, and validity of the data represent a real challenge, and advanced expertise and skills in statistical analysis and artificial intelligence are required to treat these big data.