BMC Bioinformatics (May 2024)

ViroISDC: a method for calling integration sites of hepatitis B virus based on feature encoding

  • Lei Qiao,
  • Chang Li,
  • Wei Lin,
  • Xiaoqi He,
  • Jia Mi,
  • Yigang Tong,
  • Jingyang Gao

DOI
https://doi.org/10.1186/s12859-024-05763-0
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Background Hepatitis B virus (HBV) integrates into human chromosomes and can lead to genomic instability and hepatocarcinogenesis. Current tools for HBV integration site detection lack accuracy and stability. Results This study proposes a deep learning-based method, named ViroISDC, for detecting integration sites. ViroISDC generates corresponding grammar rules and encodes the characteristics of the language data to predict integration sites accurately. Compared with Lumpy, Pindel, Seeksv, and SurVirus, ViroISDC exhibits better overall performance and is less sensitive to sequencing depth and integration sequence length, displaying good reliability, stability, and generality. Further downstream analysis of integrated sites detected by ViroISDC reveals the integration patterns and features of HBV. It is observed that HBV integration exhibits specific chromosomal preferences and tends to integrate into cancerous tissue. Moreover, HBV integration frequency was higher in males than females, and high-frequency integration sites were more likely to be present on hepatocarcinogenesis- and anti-cancer-related genes, validating the reliability of the ViroISDC. Conclusions ViroISDC pipeline exhibits superior precision, stability, and reliability across various datasets when compared to similar software. It is invaluable in exploring HBV infection in the human body, holding significant implications for the diagnosis, treatment, and prognosis assessment of HCC.

Keywords