PhytoAFP: In Silico Approaches for Designing Plant-Derived Antifungal Peptides
Atul Tyagi,
Sudeep Roy,
Sanjay Singh,
Manoj Semwal,
Ajit K. Shasany,
Ashok Sharma,
Ivo Provazník
Affiliations
Atul Tyagi
Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 12, 61600 Brno, Czech Republic
Sudeep Roy
Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 12, 61600 Brno, Czech Republic
Sanjay Singh
Biotechnology Division, CSIR—Central Institute of Medicinal and Aromatic Plants, P.O.—CIMAP, Near Kukrail Picnic Spot, Lucknow 226 015, Uttar Pradesh, India
Manoj Semwal
Biotechnology Division, CSIR—Central Institute of Medicinal and Aromatic Plants, P.O.—CIMAP, Near Kukrail Picnic Spot, Lucknow 226 015, Uttar Pradesh, India
Ajit K. Shasany
Biotechnology Division, CSIR—Central Institute of Medicinal and Aromatic Plants, P.O.—CIMAP, Near Kukrail Picnic Spot, Lucknow 226 015, Uttar Pradesh, India
Ashok Sharma
Biotechnology Division, CSIR—Central Institute of Medicinal and Aromatic Plants, P.O.—CIMAP, Near Kukrail Picnic Spot, Lucknow 226 015, Uttar Pradesh, India
Ivo Provazník
Department of Biomedical Engineering, Faculty of Electrical Engineering and Communication, Brno University of Technology, Technicka 12, 61600 Brno, Czech Republic
Emerging infectious diseases (EID) are serious problems caused by fungi in humans and plant species. They are a severe threat to food security worldwide. In our current work, we have developed a support vector machine (SVM)-based model that attempts to design and predict therapeutic plant-derived antifungal peptides (PhytoAFP). The residue composition analysis shows the preference of C, G, K, R, and S amino acids. Position preference analysis shows that residues G, K, R, and A dominate the N-terminal. Similarly, residues N, S, C, and G prefer the C-terminal. Motif analysis reveals the presence of motifs like NYVF, NYVFP, YVFP, NYVFPA, and VFPA. We have developed two models using various input functions such as mono-, di-, and tripeptide composition, as well as binary, hybrid, and physiochemical properties, based on methods that are applied to the main data set. The TPC-based monopeptide composition model achieved more accuracy, 94.4%, with a Matthews correlation coefficient (MCC) of 0.89. Correspondingly, the second-best model based on dipeptides achieved an accuracy of 94.28% under the MCC 0.89 of the training dataset.