Recent development of machine learning-based methods for the prediction of defensin family and subfamily

Phasit Charoenkwan; Nalini Schaduangrat; S. M. Hasan Mahmud; Orawit Thinnukool; Watshara Shoombuatong

doi:10.17179/excli2022-4913

EXCLI Journal : Experimental and Clinical Sciences (May 2022)

Recent development of machine learning-based methods for the prediction of defensin family and subfamily

Phasit Charoenkwan,
Nalini Schaduangrat,
S. M. Hasan Mahmud,
Orawit Thinnukool,
Watshara Shoombuatong

Affiliations

Phasit Charoenkwan: ORCiD; Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
Nalini Schaduangrat: ORCiD; Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700
S. M. Hasan Mahmud: ORCiD; Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700; Department of Computer Science, American International University-Bangladesh (AIUB), Kuratoli, Dhaka 1229, Bangladesh
Orawit Thinnukool: ORCiD; Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, Thailand, 50200
Watshara Shoombuatong: ORCiD; Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, Thailand, 10700. Phone: +66 2 441 4371; Fax: +66 2 441 4380; E-mail: [email protected]

DOI: https://doi.org/10.17179/excli2022-4913
Journal volume & issue: Vol. 21
pp. 757 – 771

Abstract

Read online

Nearly all living species comprise of host defense peptides called defensins, that are crucial for innate immunity. These peptides work by activating the immune system which kills the microbes directly or indirectly, thus providing protection to the host. Thus far, numerous preclinical and clinical trials for peptide-based drugs are currently being evaluated. Although, experimental methods can help to precisely identify the defensin peptide family and subfamily, these approaches are often time-consuming and cost-ineffective. On the other hand, machine learning (ML) methods are able to effectively employ protein sequence information without the knowledge of a protein’s three-dimensional structure, thus highlighting their predictive ability for the large-scale identification. To date, several ML methods have been developed for the in silico identification of the defensin peptide family and subfamily. Therefore, summarizing the advantages and disadvantages of the existing methods is urgently needed in order to provide useful suggestions for the development and improvement of new computational models for the identification of the defensin peptide family and subfamily. With this goal in mind, we first provide a comprehensive survey on a collection of six state-of-the-art computational approaches for predicting the defensin peptide family and subfamily. Herein, we cover different important aspects, including the dataset quality, feature encoding methods, feature selection schemes, ML algorithms, cross-validation methods and web server availability/usability. Moreover, we provide our thoughts on the limitations of existing methods and future perspectives for improving the prediction performance and model interpretability. The insights and suggestions gained from this review are anticipated to serve as a valuable guidance for researchers for the development of more robust and useful predictors.

Published in EXCLI Journal : Experimental and Clinical Sciences

ISSN: 1611-2156 (Online)
Publisher: IfADo - Leibniz Research Centre for Working Environment and Human Factors, Dortmund
Country of publisher: Germany
LCC subjects: Medicine: Internal medicine: Neoplasms. Tumors. Oncology. Including cancer and carcinogens; Science: Biology (General)
Website: http://www.excli.de

About the journal

Abstract

Keywords