On the Interpretability of Machine Learning Models and Experimental Feature Selection in Case of Multicollinear Data

Franc Drobnič; Andrej Kos; Matevž Pustišek

doi:10.3390/electronics9050761

Electronics (May 2020)

On the Interpretability of Machine Learning Models and Experimental Feature Selection in Case of Multicollinear Data

Franc Drobnič,
Andrej Kos,
Matevž Pustišek

Affiliations

Franc Drobnič: Faculty of Electrical Engineering, University of Ljubljana, Tržaška cesta 25, 1000 Ljubljana, Slovenia
Andrej Kos: Faculty of Electrical Engineering, University of Ljubljana, Tržaška cesta 25, 1000 Ljubljana, Slovenia
Matevž Pustišek: Faculty of Electrical Engineering, University of Ljubljana, Tržaška cesta 25, 1000 Ljubljana, Slovenia

DOI: https://doi.org/10.3390/electronics9050761
Journal volume & issue: Vol. 9, no. 5
p. 761

Abstract

Read online

In the field of machine learning, a considerable amount of research is involved in the interpretability of models and their decisions. The interpretability contradicts the model quality. Random Forests are among the best quality technologies of machine learning, but their operation is of “black box” character. Among the quantifiable approaches to the model interpretation, there are measures of association of predictors and response. In case of the Random Forests, this approach usually consists of calculating the model’s feature importances. Known methods, including the built-in one, are less suitable in settings with strong multicollinearity of features. Therefore, we propose an experimental approach to the feature selection task, a greedy forward feature selection method with least-trees-used criterion. It yields a set of most informative features that can be used in a machine learning (ML) training process with similar prediction quality as the original feature set. We verify the results of the proposed method on two known datasets, one with small feature multicollinearity and another with large feature multicollinearity. The proposed method also allows for a domain expert help with selecting among equally important features, which is known as the human-in-the-loop approach.

Published in Electronics

ISSN: 2079-9292 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics
Website: http://www.mdpi.com/journal/electronics

About the journal

Abstract

Keywords