Informatics in Medicine Unlocked (Jan 2022)

Accurate prediction of immunoglobulin proteins using machine learning model

  • Ali Ghulam,
  • Rahu Sikander,
  • Farman Ali,
  • Zar Nawab Khan Swati,
  • Ahsanullah Unar,
  • Dhani Bux Talpur

Journal volume & issue
Vol. 29
p. 100885

Abstract

Read online

Introduction: In recent years, researchers have become increasingly interested in immunoglobulins. The antibodies, also known as immunoglobulins, are serum proteins that bind to specific antigens produced by B cells. The complicated nature of the proteins has been emphasized for over 100 years of research on the structure and function of immunoglobulin. A major B-cell protein, which is utilized by the immune system to identify and neutralize foreign objects such as bacteria and viruses, is the Antibody (Ab), also known as immunoglobulin (Ig). Methods: This research focuses on the molecular mechanisms that allow these numerous and diverse roles. Therefore, it is very important to improve the accuracy of immunoglobulin classification by applying effective methods for disease research. We use the immunoglobulin features which reduced feature dimension were selected to extract immunoglobulin features, based on the BLOSUM vector score matrix. We've developed an ensemble learning method called Extreme Gradient Boosting (XGBoost). Results: We proposed learning method called Extreme Gradient Boosting (XGBoost) to predict immunoglobulins with 0.9727% accuracy with quintuple cross-validation of data used. In addition, the best method for identifying mixed 2D key features was used to BLOSUM62 to distinguish immunoglobulins with an ROC AUC predicted score of 0.9700%. The methods used in this paper are very accurate in predicting immunoglobulin levels and finding the key features. Conclusions: The aim of this study was to calculate appropriate concentrations of immunoglobulin and to identify important traits. In this investigation, the BLOSUM62 matrix was employed, which is a frequently utilized device for calculating the alignment of two distinct protein sequences. The value of the BLOSUM62 Matrix is determined through a large-scale analysis of observed polypeptide alignments. The best feature set created by BLOSUM62 matrix was found to reliably predict 0.9727% accuracy score of immunoglobulins using the learning method called Extreme Gradient Boosting (XGBoost) classification model.

Keywords