Unsupervised detection of fragment length signatures of circulating tumor DNA using non-negative matrix factorization
Gabriel Renaud,
Maibritt Nørgaard,
Johan Lindberg,
Henrik Grönberg,
Bram De Laere,
Jørgen Bjerggaard Jensen,
Michael Borre,
Claus Lindbjerg Andersen,
Karina Dalsgaard Sørensen,
Lasse Maretty,
Søren Besenbacher
Affiliations
Gabriel Renaud
Department of Health Technology, Section of Bioinformatics, Technical University of Denmark, Kongens Lyngby, Denmark; Department of Molecular Medicine, Aarhus University, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
Maibritt Nørgaard
Department of Molecular Medicine, Aarhus University, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
Johan Lindberg
Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
Henrik Grönberg
Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden
Bram De Laere
Department of Medical Epidemiology and Biostatistics, Karolinska Institute, Stockholm, Sweden; Cancer Research Institute Gent (CRIG), Ghent University, Ghent, Belgium; Department of Human Structure and Repair, Ghent University, Ghent, Belgium
Jørgen Bjerggaard Jensen
Department of Urology, Regional Hospital of West Jutland, Holstebro, Denmark
Michael Borre
Department of Urology, Aarhus University Hospital, Aarhus, Denmark
Claus Lindbjerg Andersen
Department of Molecular Medicine, Aarhus University, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
Karina Dalsgaard Sørensen
Department of Molecular Medicine, Aarhus University, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark
Lasse Maretty
Department of Molecular Medicine, Aarhus University, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark; Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
Department of Molecular Medicine, Aarhus University, Aarhus, Denmark; Department of Clinical Medicine, Aarhus University, Aarhus, Denmark; Bioinformatics Research Centre, Aarhus University, Aarhus, Denmark
Sequencing of cell-free DNA (cfDNA) is currently being used to detect cancer by searching both for mutational and non-mutational alterations. Recent work has shown that the length distribution of cfDNA fragments from a cancer patient can inform tumor load and type. Here, we propose non-negative matrix factorization (NMF) of fragment length distributions as a novel and completely unsupervised method for studying fragment length patterns in cfDNA. Using shallow whole-genome sequencing (sWGS) of cfDNA from a cohort of patients with metastatic castration-resistant prostate cancer (mCRPC), we demonstrate how NMF accurately infers the true tumor fragment length distribution as an NMF component - and that the sample weights of this component correlate with ctDNA levels (r=0.75). We further demonstrate how using several NMF components enables accurate cancer detection on data from various early stage cancers (AUC = 0.96). Finally, we show that NMF, when applied across genomic regions, can be used to discover fragment length signatures associated with open chromatin.