IEEE Access (Jan 2020)
An Explainable Artificial Intelligence Model for Clustering Numerical Databases
Abstract
Nowadays, the international scientific community of machine learning has an enormous campaign in favor of creating understandable models instead of black-box models. The main reason is that experts in the application area are showing reluctance due to black-box models cannot be understood by them, and consequently, their results are difficult to be explained. In unsupervised problems, where experts have not labeled objects, obtaining an explanation of the results is necessary because specialists in the application area need to understand both the applied model as well as the obtained results for finding the rationale behind each obtained clustering from a practical point of view. Hence, in this paper, we introduce a clustering based on decision trees (eUD3.5), which builds several decision trees from numerical databases. Unlike previous solutions, our proposal takes into account both separation and compactness for evaluating a feature split without decreasing time efficiency and with no empirical parameter to control the depth of the trees. We tested eUD3.5 on 40 numerical databases of UCI Machine Learning Repository, showing that our proposal builds a set of high-quality unsupervised decision trees for clustering, allowing us to obtain the best average ranking compared with other popular state-of-the-art clustering solutions. Also, from the collection of unsupervised decision trees induced by our proposal, a set of high-quality patterns are extracted for showing the main feature-value pairs describing each cluster.
Keywords