An Explainable Artificial Intelligence Model for Clustering Numerical Databases

Octavio Loyola-Gonzalez; Andres Eduardo Gutierrez-Rodriguez; Miguel Angel Medina-Perez; Raul Monroy; Jose Francisco Martinez-Trinidad; Jesus Ariel Carrasco-Ochoa; Milton Garcia-Borroto

doi:10.1109/ACCESS.2020.2980581

IEEE Access (Jan 2020)

An Explainable Artificial Intelligence Model for Clustering Numerical Databases

Octavio Loyola-Gonzalez,
Andres Eduardo Gutierrez-Rodriguez,
Miguel Angel Medina-Perez,
Raul Monroy,
Jose Francisco Martinez-Trinidad,
Jesus Ariel Carrasco-Ochoa,
Milton Garcia-Borroto

Affiliations

Octavio Loyola-Gonzalez: ORCiD; Tecnologico de Monterrey, Puebla, Mexico
Andres Eduardo Gutierrez-Rodriguez: ORCiD; Tecnologico de Monterrey, San Antonio Buenavista, Mexico
Miguel Angel Medina-Perez: ORCiD; Tecnologico de Monterrey, Estado de Mexico, Mexico
Raul Monroy: Tecnologico de Monterrey, Estado de Mexico, Mexico
Jose Francisco Martinez-Trinidad: ORCiD; Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, Mexico
Jesus Ariel Carrasco-Ochoa: ORCiD; Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, Mexico
Milton Garcia-Borroto: ORCiD; Instituto Superior Politécnico José Antonio Echeverría, Habana, Mexico

DOI: https://doi.org/10.1109/ACCESS.2020.2980581
Journal volume & issue: Vol. 8
pp. 52370 – 52384

Abstract

Read online

Nowadays, the international scientific community of machine learning has an enormous campaign in favor of creating understandable models instead of black-box models. The main reason is that experts in the application area are showing reluctance due to black-box models cannot be understood by them, and consequently, their results are difficult to be explained. In unsupervised problems, where experts have not labeled objects, obtaining an explanation of the results is necessary because specialists in the application area need to understand both the applied model as well as the obtained results for finding the rationale behind each obtained clustering from a practical point of view. Hence, in this paper, we introduce a clustering based on decision trees (eUD3.5), which builds several decision trees from numerical databases. Unlike previous solutions, our proposal takes into account both separation and compactness for evaluating a feature split without decreasing time efficiency and with no empirical parameter to control the depth of the trees. We tested eUD3.5 on 40 numerical databases of UCI Machine Learning Repository, showing that our proposal builds a set of high-quality unsupervised decision trees for clustering, allowing us to obtain the best average ranking compared with other popular state-of-the-art clustering solutions. Also, from the collection of unsupervised decision trees induced by our proposal, a set of high-quality patterns are extracted for showing the main feature-value pairs describing each cluster.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords