Extensions of the External Validation for Checking Learned Model Interpretability and Generalizability

Sung Yang Ho; Kimberly Phua; Limsoon Wong; Wilson Wen Bin Goh

Patterns (Nov 2020)

Extensions of the External Validation for Checking Learned Model Interpretability and Generalizability

Sung Yang Ho,
Kimberly Phua,
Limsoon Wong,
Wilson Wen Bin Goh

Affiliations

Sung Yang Ho: School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
Kimberly Phua: School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore
Limsoon Wong: Department of Computer Science, National University of Singapore, Singapore 117417, Singapore; Corresponding author
Wilson Wen Bin Goh: School of Biological Sciences, Nanyang Technological University, Singapore 637551, Singapore; Corresponding author

Journal volume & issue: Vol. 1, no. 8
p. 100129

Abstract

Read online

Summary: We discuss the validation of machine learning models, which is standard practice in determining model efficacy and generalizability. We argue that internal validation approaches, such as cross-validation and bootstrap, cannot guarantee the quality of a machine learning model due to potentially biased training data and the complexity of the validation procedure itself. For better evaluating the generalization ability of a learned model, we suggest leveraging on external data sources from elsewhere as validation datasets, namely external validation. Due to the lack of research attractions on external validation, especially a well-structured and comprehensive study, we discuss the necessity for external validation and propose two extensions of the external validation approach that may help reveal the true domain-relevant model from a candidate set. Moreover, we also suggest a procedure to check whether a set of validation datasets is valid and introduce statistical reference points for detecting external data problems. The Bigger Picture: External validation is critical for establishing machine learning model quality. To improve rigor and introduce structure into external validation processes, we propose two extensions, convergent and divergent validation. Using a case study, we demonstrate how convergent and divergent validations are set up and also discuss technical considerations for gauging performance, including establishment of statistical rigor, how to acquire valid external data, determining the number of times an external validation needs to be performed, and what to do when multiple external validations disagree with each other. Finally, we highlight that external validation remains and will be highly relevant, even to new machine learning paradigms.

Published in Patterns

ISSN: 2666-3899 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science: Computer software
Website: https://www.cell.com/patterns

About the journal

Abstract

Keywords