Biomarker Research (Nov 2022)

Identification of risk variants related to malignant tumors in children with birth defects by whole genome sequencing

  • Yichuan Liu,
  • Hui-Qi Qu,
  • Xiao Chang,
  • Frank D Mentch,
  • Haijun Qiu,
  • Kenny Nguyen,
  • Xiang Wang,
  • Amir Hossein Saeidian,
  • Deborah Watson,
  • Joseph Glessner,
  • Hakon Hakonarson

DOI
https://doi.org/10.1186/s40364-022-00431-y
Journal volume & issue
Vol. 10, no. 1
pp. 1 – 9

Abstract

Read online

Abstract Background Children with birth defects (BD) are more likely to develop cancer and the increased risk of cancer persists into adulthood. Prior population-based assessments have demonstrated that even non-chromosomal BDs are associated with at least two-fold increase of cancer risk. Identification of variants that are associated with malignant tumor in BD patients without chromosomal anomalies may improve our understanding of the underlying molecular mechanisms and provide clues for early cancer detection in children with BD. Methods In this study, whole genome sequencing (WGS) data of blood-derived DNA for 1653 individuals without chromosomal anomalies were acquired from the Kids First Data Resource Center (DRC), including 541 BD probands with at least one type of malignant tumors, 767 BD probands without malignant tumor, and 345 healthy family members who are the parents or siblings of the probands. Recurrent variants exclusively seen in cancer patients were selected and mapped to their corresponding genomic regions. The targeted genes/non-coding RNAs were further reduced using random forest and forward feature selection (ffs) models. Results The filtered genes/non-coding RNAs, including variants in non-coding areas, showed enrichment in cancer-related pathways. To further support the validity of these variants, blood WGS data of additional 40 independent BD probands, including 25 patients with at least one type of cancers from unrelated projects, were acquired. The counts of variants of interest identified in the Kid First data showed clear deviation in the validation dataset between BD patients with cancer and without cancer. Furthermore, a deep learning model was built to assess the predictive abilities in the 40 patients using variants of interest identified in the Kids First cohort as feature vectors. The accuracies are ~ 75%, with the noteworthy observation that variants mapped to non-coding regions provided the highest accuracy (31 out of 40 patients were labeled correctly). Conclusion We present for the first time a panorama of genetic variants that are associated with cancers in non-chromosomal BD patients, implying that our approach may potentially serve for the early detection of malignant tumors in patients with BD.

Keywords