Communications Biology (Jun 2025)

A versatile CRISPR/Cas9 system off-target prediction tool using language model

  • Weian Du,
  • Liang Zhao,
  • Kaichuan Diao,
  • Yangyang Zheng,
  • Qianyong Yang,
  • Zhenzhen Zhu,
  • Xiangxing Zhu,
  • Dongsheng Tang

DOI
https://doi.org/10.1038/s42003-025-08275-6
Journal volume & issue
Vol. 8, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Genome editing with the CRISPR/Cas9 system has revolutionized life and medical sciences, particularly in treating monogenic genetic diseases by enabling long-term therapeutic effects from a single intervention. However, the CRISPR/Cas9 system can tolerate mismatches and DNA/RNA bulges at target sites, leading to unintended off-target effects that pose challenges for gene-editing therapy development. Existing high-throughput detection and in silico prediction methods are often limited to specifically designed single guide RNAs (sgRNAs) and perform poorly on unseen sequences. To address these limitations, we introduce CCLMoff, a deep learning framework for off-target prediction that incorporates a pretrained RNA language model from RNAcentral. CCLMoff captures mutual sequence information between sgRNAs and target sites and is trained on a comprehensive, updated dataset. This approach enables accurate off-target identification and strong generalization across diverse NGS-based detection datasets. Model interpretation reveals the biological importance of the seed region, underscoring CCLMoff’s analytical capabilities. The development of CCLMoff lays the foundation for a comprehensive, end-to-end sgRNA design platform, enhancing both the precision and efficiency of CRISPR/Cas9-based therapeutics. CCLMoff is a versatile tool and is publicly available at github.com/duwa2/CCLMoff .