A Trust-Based Methodology to Evaluate Deep Learning Models for Automatic Diagnosis of Ocular Toxoplasmosis from Fundus Images
Rodrigo Parra,
Verena Ojeda,
Jose Luis Vázquez Noguera,
Miguel García-Torres,
Julio César Mello-Román,
Cynthia Villalba,
Jacques Facon,
Federico Divina,
Olivia Cardozo,
Verónica Elisa Castillo,
Ingrid Castro Matto
Affiliations
Rodrigo Parra
Centro de Investigación, Universidad Americana, Avenida Brasilia 1100, Asunción 1206, Paraguay
Verena Ojeda
Centro de Investigación, Universidad Americana, Avenida Brasilia 1100, Asunción 1206, Paraguay
Jose Luis Vázquez Noguera
Centro de Investigación, Universidad Americana, Avenida Brasilia 1100, Asunción 1206, Paraguay
Miguel García-Torres
Data Science and Big Data Lab., Universidad Pablo de Olavide, ES-41013 Seville, Spain
Julio César Mello-Román
Centro de Investigación, Universidad Americana, Avenida Brasilia 1100, Asunción 1206, Paraguay
Cynthia Villalba
Facultad Politécnica, Universidad Nacional de Asunción, San Lorenzo 2169, Paraguay
Jacques Facon
Department of Computer and Electronics, Universidade Federal do Espírito Santo, São Mateus 29932-540, Brazil
Federico Divina
Data Science and Big Data Lab., Universidad Pablo de Olavide, ES-41013 Seville, Spain
Olivia Cardozo
Department of Ophthalmology, Hospital General Pediátrico Niños de Acosta Ñu, San Lorenzo 2169, Paraguay
Verónica Elisa Castillo
Departamento de Retina, Cátedra de Oftalmología, Hospital de Clínicas, Facultad de Ciencias Médicas, Universidad Nacional de Asunción, San Lorenzo 2169, Paraguay
Ingrid Castro Matto
Departamento de Retina, Cátedra de Oftalmología, Hospital de Clínicas, Facultad de Ciencias Médicas, Universidad Nacional de Asunción, San Lorenzo 2169, Paraguay
In the automatic diagnosis of ocular toxoplasmosis (OT), Deep Learning (DL) has arisen as a powerful and promising approach for diagnosis. However, despite the good performance of the models, decision rules should be interpretable to elicit trust from the medical community. Therefore, the development of an evaluation methodology to assess DL models based on interpretability methods is a challenging task that is necessary to extend the use of AI among clinicians. In this work, we propose a novel methodology to quantify the similarity between the decision rules used by a DL model and an ophthalmologist, based on the assumption that doctors are more likely to trust a prediction that was based on decision rules they can understand. Given an eye fundus image with OT, the proposed methodology compares the segmentation mask of OT lesions labeled by an ophthalmologist with the attribution matrix produced by interpretability methods. Furthermore, an open dataset that includes the eye fundus images and the segmentation masks is shared with the community. The proposal was tested on three different DL architectures. The results suggest that complex models tend to perform worse in terms of likelihood to be trusted while achieving better results in sensitivity and specificity.