StackedEnC-AOP: prediction of antioxidant proteins using transform evolutionary and sequential features based multi-scale vector with stacked ensemble learning

Gul Rukh; Shahid Akbar; Gauhar Rehman; Fawaz Khaled Alarfaj; Quan Zou

doi:10.1186/s12859-024-05884-6

BMC Bioinformatics (Aug 2024)

StackedEnC-AOP: prediction of antioxidant proteins using transform evolutionary and sequential features based multi-scale vector with stacked ensemble learning

Gul Rukh,
Shahid Akbar,
Gauhar Rehman,
Fawaz Khaled Alarfaj,
Quan Zou

Affiliations

Gul Rukh: Department of Zoology, Abdul Wali Khan University Mardan
Shahid Akbar: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China
Gauhar Rehman: Department of Zoology, Abdul Wali Khan University Mardan
Fawaz Khaled Alarfaj: Department of Management Information Systems (MIS), School of Business, King Faisal University (KFU)
Quan Zou: Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China

DOI: https://doi.org/10.1186/s12859-024-05884-6
Journal volume & issue: Vol. 25, no. 1
pp. 1 – 22

Abstract

Read online

Abstract Background Antioxidant proteins are involved in several biological processes and can protect DNA and cells from the damage of free radicals. These proteins regulate the body's oxidative stress and perform a significant role in many antioxidant-based drugs. The current invitro-based medications are costly, time-consuming, and unable to efficiently screen and identify the targeted motif of antioxidant proteins. Methods In this model, we proposed an accurate prediction method to discriminate antioxidant proteins namely StackedEnC-AOP. The training sequences are formulation encoded via incorporating a discrete wavelet transform (DWT) into the evolutionary matrix to decompose the PSSM-based images via two levels of DWT to form a Pseudo position-specific scoring matrix (PsePSSM-DWT) based embedded vector. Additionally, the Evolutionary difference formula and composite physiochemical properties methods are also employed to collect the structural and sequential descriptors. Then the combined vector of sequential features, evolutionary descriptors, and physiochemical properties is produced to cover the flaws of individual encoding schemes. To reduce the computational cost of the combined features vector, the optimal features are chosen using Minimum redundancy and maximum relevance (mRMR). The optimal feature vector is trained using a stacking-based ensemble meta-model. Results Our developed StackedEnC-AOP method reported a prediction accuracy of 98.40% and an AUC of 0.99 via training sequences. To evaluate model validation, the StackedEnC-AOP training model using an independent set achieved an accuracy of 96.92% and an AUC of 0.98. Conclusion Our proposed StackedEnC-AOP strategy performed significantly better than current computational models with a ~ 5% and ~ 3% improved accuracy via training and independent sets, respectively. The efficacy and consistency of our proposed StackedEnC-AOP make it a valuable tool for data scientists and can execute a key role in research academia and drug design.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords