Development of a deep learning model for cancer diagnosis by inspecting cell-free DNA end-motifs

Hongru Shen; Meng Yang; Jilei Liu; Kexin Chen; Xiangchun Li

doi:10.1038/s41698-024-00635-5

npj Precision Oncology (Jul 2024)

Development of a deep learning model for cancer diagnosis by inspecting cell-free DNA end-motifs

Hongru Shen,
Meng Yang,
Jilei Liu,
Kexin Chen,
Xiangchun Li

Affiliations

Hongru Shen: Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University
Meng Yang: Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University
Jilei Liu: Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University
Kexin Chen: Department of Epidemiology and Biostatistics, Tianjin’s Clinical Research Center for Cancer, Key Laboratory of Molecular Cancer Epidemiology of Tianjin, Key Laboratory of Prevention and Control of Major Diseases in the Population Ministry of Education, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University
Xiangchun Li: Tianjin Cancer Institute, Tianjin’s Clinical Research Center for Cancer, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University

DOI: https://doi.org/10.1038/s41698-024-00635-5
Journal volume & issue: Vol. 8, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Accurate discrimination between patients with and without cancer from cfDNA is crucial for early cancer diagnosis. Herein, we develop and validate a deep-learning-based model entitled end-motif inspection via transformer (EMIT) for discriminating individuals with and without cancer by learning feature representations from cfDNA end-motifs. EMIT is a self-supervised learning approach that models rankings of cfDNA end-motifs. We include 4606 samples subjected to different types of cfDNA sequencing to develop EIMIT, and subsequently evaluate classification performance of linear projections of EMIT on six datasets and an additional inhouse testing set encopassing whole-genome, whole-genome bisulfite and 5-hydroxymethylcytosine sequencing. The linear projection of representations from EMIT achieved area under the receiver operating curve (AUROC) values ranged from 0.895 (0.835–0.955) to 0.996 (0.994–0.997) across these six datasets, outperforming its baseline by significant margins. Additionally, we showed that linear projection of EMIT representations can achieve an AUROC of 0.962 (0.914–1.0) in identification of lung cancer on an independent testing set subjected to whole-exome sequencing. The findings of this study indicate that a transformer-based deep learning model can learn cancer-discrimative representations from cfDNA end-motifs. The representations of this deep learning model can be exploited for discriminating patients with and without cancer.

Published in npj Precision Oncology

ISSN: 2397-768X (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens
Website: https://www.nature.com/npjprecisiononcology/

About the journal