BIO Web of Conferences (Jan 2023)
N-Grams Modeling for Protein Secondary Structure Prediction: Exploring Local Features and Optimal CNN Parameters
Abstract
This study explores the potential of n-gram modeling in protein secondary structure prediction. Experiments are conducted on three datasets using bigrams, trigrams, and a combination of the best n-grams with PSSM profiles. Optimal parameters for Convolutional Neural Networks (CNNs) are investigated. Results indicate that bigrams outperform trigrams in Q8 accuracy. Adding another feature, that is, PSSM, can improve model performance. Deeper convolution layers and longer convolution sizes enhance accuracy. Both bigrams and trigrams demonstrate similar performance trends, with bigrams slightly more effective. The study offers insights into local feature extraction, which is n-grams for protein modeling. These findings contribute to protein structure analysis and bioinformatics advancements, facilitating improved protein function prediction.
Keywords