Majallah-i Dānishgāh-i ̒Ulūm-i Pizishkī-i Qum (Jun 2020)
Identification of Heat-Resistant Bacteria Based on Selection of Proper Representation of Protein Sequences Using Deep Learning Approach
Abstract
Background and Objectives: Identification of effective mechanisms in heat-resistance in bacteria is of great importance in some industries, such as food industry, textile manufacturing, and especially in detergent production industries. For this purpose, deep learning tools were used to identify the characteristics of heat-resistant bacteria based on protein properties. Methods: Some characteristics of heat-resistant and non-heat-resistant proteins, such as the structural properties of amino acids, the number and the frequency of each amino acid, and their physicochemical properties, were calculated. Bacterial classification was performed in three steps: first, attribute weighting methods were used to select the important variables, then those variables, were selected and finally deep learning networks were employed to extract the hierarchy of the features. Results: The results of 10 weighting methods showed that out of 73 characteristics of the number and frequency of amino acids, only 40 had weights higher than zero. Of these variables, 13 variable gained weight higher than 0.5 and only 10 variables had weight above 0.09. These 10 features were selected as important variables. The frequencies of glutamine and glutamic acid obtained the highest possible weights and were considered as two important features in the classification of heat-resistant and non-heat-resistant bacteria. The highest prediction accuracy of the deep learning networks was 92.42% for the classification of heat resistant bacteria. Conclusion: The deep neural networks can be effectively used to identify heat-resistant bacteria based on their protein properties.