Advanced Intelligent Systems (Apr 2024)

ASTK: A Machine Learning‐Based Integrative Software for Alternative Splicing Analysis

  • Shenghui Huang,
  • Jiangshuang He,
  • Lei Yu,
  • Jun Guo,
  • Shangying Jiang,
  • Zhaoxia Sun,
  • Linghui Cheng,
  • Xing Chen,
  • Xiang Ji,
  • Yi Zhang

DOI
https://doi.org/10.1002/aisy.202300594
Journal volume & issue
Vol. 6, no. 4
pp. n/a – n/a

Abstract

Read online

Alternative splicing (AS) is a fundamental mechanism that regulates gene expressionin both physiological and pathological processes. This article introduces ASTK, a software package covering upstream and downstream analysis of AS. Initially, ASTK offers a module to perform enrichment analysis at both the gene‐ and exon‐level to incorporate various impacts by different spliced events on a single gene. We further cluster AS genes and alternative exons into three groups based on spliced exon sizes (micro‐, mid‐, and macro‐), which are preferentially associated with distinct biological pathways. A major challenge in the field has been decoding the regulatory codes of splicing. ASTK adeptly extracts both sequence features and epigenetic marks associated with AS events. Through the application of machine learning algorithms, we identified pivotal features influencing the inclusion levels of most AS types. Notably, the splice site strength is a primary determinant for the inclusion levels in alternative 3’/5’ splice sites (A3/A5). For the alternative first exon and skipping exon classes, a combination of sequence and epigenetic features collaboratively dictate exon inclusion/exclusion. Our findings underscore ASTK's capability to enhance the functional understanding of AS events and shed light on the intricacies of splicing regulation.

Keywords