IEEE Access (Jan 2021)

Auto-KPCA: A Two-Step Hybrid Feature Extraction Technique for Quantitative Structure–Activity Relationship Modeling

  • Shrooq A. Alsenan,
  • Isra M. Al-Turaiki,
  • Alaaeldin M. Hafez

DOI
https://doi.org/10.1109/ACCESS.2020.3047375
Journal volume & issue
Vol. 9
pp. 2466 – 2477

Abstract

Read online

Quantitative structure-activity relationship (QSAR) modeling is an established approach for drug discovery, but many QSAR datasets suffer from the curse of dimensionality, a challenge that is usually addressed by using dimensionality reduction techniques such as principal component analysis (PCA). However, although linear feature extraction techniques have low computational cost and can handle linear relationships between descriptors, they cannot handle the complex structures found in QSAR data. Hybridization of feature extraction techniques is an effective approach to address the challenges of high-dimensional datasets, and combining the benefits of at least two dimensionality reduction techniques has been successful in many fields. This paper proposes Auto-KPCA, a two-step hybrid feature extraction technique that leverages (i) the fast computational capability of kernel PCA (KPCA) and (ii) the performance of a deep generalized autoencoder in handling complex data structures. Based on classification accuracy, the proposed approach is compared to other feature extraction techniques on the same benchmark dataset. The capability of Auto-KPCA is then investigated further by testing four deep-learning classification models, namely a convolutional neural network, a recurrent neural network, a feedforward deep neural network, and long short-term memory. To the best of the authors' knowledge, this study is the first to investigate hybridization of KPCA and a deep generalized autoencoder in the context of QSAR. The reported results (i) provide invaluable insights regarding the behavior of different techniques in predicting class labels and (ii) demonstrate increased classification accuracy and noticeably decreased mean square error when compared with KPCA and autoencoders.

Keywords