NUML International Journal of Engineering and Computing (Apr 2022)
Classification of Biological Data using Deep Learning Technique
Abstract
A huge amount of newly sequenced proteins is being discovered on daily basis. The main concern is how to extract the useful characteristics of sequences as the input features for the network. These sequences are increasing exponentially over the decades. However, it is very expensive to characterize functions for biological experiments and also, it is really necessary to find the association between the information of datasets to create and improve medical tools. Recently machine learning algorithms got huge attention and are widely used. These algorithms are based on deep learning architecture and data-driven models. Previous work failed to properly address issues related to the classification of biological sequences i.e. protein including efficient encoding of variable length biological sequence data and implementation of deep learning based neural network models to enhance the performance of classification/ recognition systems. To overcome these issues, we have proposed a deep learning based neural network architecture so that classification performance of the system can be increased. In our work, we have proposed 1D-convolution neural network which classifies the protein sequences to 10 top common classes. The model extracted features from the protein sequences labels and learned through the dataset. We have trained and evaluate our model on protein sequences downloaded from protein data bank (PDB). The model maximizes the accuracy rate up to 96%.
Keywords