Classification Confidence in Exploratory Learning: A User’s Guide

Peter Salamon; David Salamon; V. Adrian Cantu; Michelle An; Tyler Perry; Robert A. Edwards; Anca M. Segall

doi:10.3390/make5030043

Machine Learning and Knowledge Extraction (Jul 2023)

Classification Confidence in Exploratory Learning: A User’s Guide

Peter Salamon,
David Salamon,
V. Adrian Cantu,
Michelle An,
Tyler Perry,
Robert A. Edwards,
Anca M. Segall

Affiliations

Peter Salamon: Department of Mathematics, San Diego State University, San Diego, CA 92182, USA
David Salamon: Department of Mathematics, San Diego State University, San Diego, CA 92182, USA
V. Adrian Cantu: Computational Science Research Center, San Diego State University, San Diego, CA 92182, USA
Michelle An: Bioinformatics and Medical Informatics Program, San Diego State University, San Diego, CA 92182, USA
Tyler Perry: Computational Science Research Center, San Diego State University, San Diego, CA 92182, USA
Robert A. Edwards: Flinders Accelerator for Microbiome Exploration, Flinders University, Flinders, Adelaide, SA 5001, Australia
Anca M. Segall: Department of Biology, San Diego State University, San Diego, CA 92182, USA

DOI: https://doi.org/10.3390/make5030043
Journal volume & issue: Vol. 5, no. 3
pp. 803 – 829

Abstract

Read online

This paper investigates the post-hoc calibration of confidence for “exploratory” machine learning classification problems. The difficulty in these problems stems from the continuing desire to push the boundaries of which categories have enough examples to generalize from when curating datasets, and confusion regarding the validity of those categories. We argue that for such problems the “one-versus-all” approach (top-label calibration) must be used rather than the “calibrate-the-full-response-matrix” approach advocated elsewhere in the literature. We introduce and test four new algorithms designed to handle the idiosyncrasies of category-specific confidence estimation using only the test set and the final model. Chief among these methods is the use of kernel density ratios for confidence calibration including a novel algorithm for choosing the bandwidth. We test our claims and explore the limits of calibration on a bioinformatics application (PhANNs) as well as the classic MNIST benchmark. Finally, our analysis argues that post-hoc calibration should always be performed, may be performed using only the test dataset, and should be sanity-checked visually.

Published in Machine Learning and Knowledge Extraction

ISSN: 2504-4990 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering: Electronics: Computer engineering. Computer hardware
Website: https://www.mdpi.com/journal/make

About the journal

Abstract

Keywords