Informatics in Medicine Unlocked (Jan 2022)

Classification of imbalanced protein sequences with deep-learning approaches; application on influenza A imbalanced virus classes

  • Reza Ahsan,
  • Faezeh Ebrahimi,
  • Mansour Ebrahimi

Journal volume & issue
Vol. 29
p. 100860

Abstract

Read online

Classifiers based on machine learning perform well in the classification of balanced data but struggle with imbalanced data and often merge or ignore the rarer classes, even if the rare classes are more important than other classes. A long-term learning dependency, or Long Short-Term Memory (LSTM) architecture, was developed to compare conventional models with LSTM on polynomial and time-matrix datasets to address the imbalanced classes of influenza virus A. The performances of tree induction and K-Nearest Neighborhood models were less than 90%, and they were not accurate in classifying the classes with fewer samples. The proposed LSTM model can predict all classes reached the highest possible figure of 100%. Thus, for the first time, classification of the imbalanced dataset of influenza virus A at the sequential levels is being reported, which paves the road for the analysis of the proteome-based classification of other proteins.

Keywords