BMC Bioinformatics (Mar 2022)

Interpretation of convolutional neural networks reveals crucial sequence features involving in transcription during fiber development

  • Shang Liu,
  • Hailiang Cheng,
  • Javaria Ashraf,
  • Youping Zhang,
  • Qiaolian Wang,
  • Limin Lv,
  • Man He,
  • Guoli Song,
  • Dongyun Zuo

DOI
https://doi.org/10.1186/s12859-022-04619-9
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 15

Abstract

Read online

Abstract Background Upland cotton provides the most natural fiber in the world. During fiber development, the quality and yield of fiber were influenced by gene transcription. Revealing sequence features related to transcription has a profound impact on cotton molecular breeding. We applied convolutional neural networks to predict gene expression status based on the sequences of gene transcription start regions. After that, a gradient-based interpretation and an N-adjusted kernel transformation were implemented to extract sequence features contributing to transcription. Results Our models had approximate 80% accuracies, and the area under the receiver operating characteristic curve reached over 0.85. Gradient-based interpretation revealed 5' untranslated region contributed to gene transcription. Furthermore, 6 DOF binding motifs and 4 transcription activator binding motifs were obtained by N-adjusted kernel-motif transformation from models in three developmental stages. Apart from 10 general motifs, 3 DOF5.1 genes were also detected. In silico analysis about these motifs’ binding proteins implied their potential functions in fiber formation. Besides, we also found some novel motifs in plants as important sequence features for transcription. Conclusions In conclusion, the N-adjusted kernel transformation method could interpret convolutional neural networks and reveal important sequence features related to transcription during fiber development. Potential functions of motifs interpreted from convolutional neural networks could be validated by further wet-lab experiments and applied in cotton molecular breeding.

Keywords