BMC Bioinformatics (Nov 2021)

GSA: an independent development algorithm for calling copy number and detecting homologous recombination deficiency (HRD) from target capture sequencing

  • Dongju Chen,
  • Minghui Shao,
  • Pei Meng,
  • Chunli Wang,
  • Qi Li,
  • Yuhang Cai,
  • Chengcheng Song,
  • Xi Wang,
  • Taiping Shi

DOI
https://doi.org/10.1186/s12859-021-04487-9
Journal volume & issue
Vol. 22, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background The gain or loss of large chromosomal regions or even whole chromosomes is termed as genomic scarring and can be observed as copy number variations resulting from the failure of DNA damage repair. Results In this study, a new algorithm called genomic scar analysis (GSA) has developed and validated to calculate homologous recombination deficiency (HRD) score. The two critical submodules were tree recursion (TR) segmentation and filtering, and the estimation and correction of the tumor purity and ploidy. Then, this study evaluated the rationality of segmentation and genotype identification by the GSA algorithm and compared with other two algorithms, PureCN and ASCAT, found that the segmentation result of GSA algorithm was more logical. In addition, the results indicated that the GSA algorithm had an excellent predictive effect on tumor purity and ploidy, if the tumor purity was more than 20%. Furtherly, this study evaluated the HRD scores and BRCA1/2 deficiency status of 195 clinical samples, and the results indicated that the accuracy was 0.98 (comparing with Affymetrix OncoScan™ assay) and the sensitivity was 95.2% (comparing with BRCA1/2 deficiency status), both were well-behaved. Finally, HRD scores and 16 genes mutations (TP53 and 15 HRR pathway genes) were analyzed in 17 cell lines, the results showed that there was higher frequency in HRR pathway genes in high HRD score samples. Conclusions This new algorithm, named as GSA, could effectively and accurately calculate the purity and ploidy of tumor samples through NGS data, and then reflect the degree of genomic instability and large-scale copy number variations of tumor samples.

Keywords