Cell Discovery (Feb 2024)
Deep learning models incorporating endogenous factors beyond DNA sequences improve the prediction accuracy of base editing outcomes
Abstract
Abstract Adenine base editors (ABEs) and cytosine base editors (CBEs) enable the single nucleotide editing of targeted DNA sites avoiding generation of double strand breaks, however, the genomic features that influence the outcomes of base editing in vivo still remain to be characterized. High-throughput datasets from lentiviral integrated libraries were used to investigate the sequence features affecting base editing outcomes, but the effects of endogenous factors beyond the DNA sequences are still largely unknown. Here the base editing outcomes of ABE and CBE were evaluated in mammalian cells for 5012 endogenous genomic sites and 11,868 genome-integrated target sequences, with 4654 genomic sites sharing the same target sequences. The comparative analyses revealed that the editing outcomes of ABE and CBE at endogenous sites were substantially different from those obtained using genome-integrated sequences. We found that the base editing efficiency at endogenous target sites of both ABE and CBE was influenced by endogenous factors, including epigenetic modifications and transcriptional activity. A deep-learning algorithm referred as BE_Endo, was developed based on the endogenous factors and sequence information from our genomic datasets, and it yielded unprecedented accuracy in predicting the base editing outcomes. These findings along with the developed computational algorithms may facilitate future application of BEs for scientific research and clinical gene therapy.