Genome Biology (Mar 2025)
DAGIP: alleviating cell-free DNA sequencing biases with optimal transport
Abstract
Abstract Cell-free DNA (cfDNA) is a rich source of biomarkers for various pathophysiological conditions. Preanalytical variables, such as the library preparation protocol or sequencing platform, are major confounders of cfDNA analysis. We present DAGIP, a novel data correction method that builds on optimal transport theory and deep learning, which explicitly corrects for the effect of such preanalytical variables and can infer technical biases. Our method improves cancer detection and copy number alteration analysis by alleviating the sources of variation that are not of biological origin. It also enhances fragmentomic analysis of cfDNA. DAGIP allows the integration of cohorts from different studies.