Topological embedding and directional feature importance in ensemble classifiers for multi-class classification

Eloisa Rocha Liedl; Shabeer Mohamed Yassin; Melpomeni Kasapi; Joram M. Posma

Computational and Structural Biotechnology Journal (Dec 2024)

Topological embedding and directional feature importance in ensemble classifiers for multi-class classification

Eloisa Rocha Liedl,
Shabeer Mohamed Yassin,
Melpomeni Kasapi,
Joram M. Posma

Affiliations

Eloisa Rocha Liedl: Section of Bioinformatics, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Hammersmith Hospital Campus, Imperial College London, London, W12 0NN, United Kingdom; Department of Surgery and Cancer, Faculty of Medicine, Hammersmith Hospital Campus, Imperial College London, London, W12 0NN, United Kingdom
Shabeer Mohamed Yassin: Section of Bioinformatics, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Hammersmith Hospital Campus, Imperial College London, London, W12 0NN, United Kingdom; Centre for Integrative Systems Biology and Bioinformatics (CISBIO), Department of Life Sciences, Faculty of Natural Sciences, South Kensington Campus, Imperial College London, London, SW7 2AZ, United Kingdom
Melpomeni Kasapi: Section of Bioinformatics, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Hammersmith Hospital Campus, Imperial College London, London, W12 0NN, United Kingdom; Section of Nutrition, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Hammersmith Hospital Campus, Imperial College London, London, W12 0NN, United Kingdom; Corresponding authors.
Joram M. Posma: Section of Bioinformatics, Department of Metabolism, Digestion and Reproduction, Faculty of Medicine, Hammersmith Hospital Campus, Imperial College London, London, W12 0NN, United Kingdom; Corresponding authors.

Journal volume & issue: Vol. 23
pp. 4108 – 4123

Abstract

Read online

Cancer is the second leading cause of disease-related death worldwide, and machine learning-based identification of novel biomarkers is crucial for improving early detection and treatment of various cancers. A key challenge in applying machine learning to high-dimensional data is deriving important features in an interpretable manner to provide meaningful insights into the underlying biological mechanismsWe developed a class-based directional feature importance (CLIFI) metric for decision tree methods and demonstrated its use for The Cancer Genome Atlas proteomics data. The CLIFI metric was incorporated into four algorithms, Random Forest (RF), LAtent VAriable Stochastic Ensemble of Trees (LAVASET), and Gradient Boosted Decision Trees (GBDTs), and a new extension incorporating the LAVA step into GBDTs (LAVABOOST). Both LAVA methods incorporate topological information from protein interactions into the decision function.The different models' performance in classifying 28 cancers resulted in F1-scores of 92.6% (RF), 92.0% (LAVASET), 89.3% (LAVABOOST) and 85.7% (GBDT), with no method outperforming all others for individual cancer type prediction. The CLIFI metric enables visualisation of the model's decision-making functions. The resulting CLIFI value distributions indicated heterogeneity in the expression of several proteins (MYH11, ERα, BCL2) across different cancer types (including brain glioma, breast, kidney, thyroid and prostate cancer) aligning with the original raw expression data.In conclusion, we have developed an integrated, directional feature importance metric for multi-class decision tree-based classification models that facilitates interpretable feature importance assessment. The CLIFI metric can be combined with incorporating topological information into the decision functions of models to introduce inductive bias, enhancing interpretability.

Published in Computational and Structural Biotechnology Journal

ISSN: 2001-0370 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology: Chemical technology: Biotechnology
Website: https://www.journals.elsevier.com/computational-and-structural-biotechnology-journal

About the journal

Abstract

Keywords