Explainable text-tabular models for predicting mortality risk in companion animals

James Burton; Sean Farrell; Peter-John Mäntylä Noble; Noura Al Moubayed

doi:10.1038/s41598-024-64551-1

Scientific Reports (Jun 2024)

Explainable text-tabular models for predicting mortality risk in companion animals

James Burton,
Sean Farrell,
Peter-John Mäntylä Noble,
Noura Al Moubayed

Affiliations

James Burton: Department of Computer Science, Durham University
Sean Farrell: Department of Computer Science, Durham University
Peter-John Mäntylä Noble: Institute of Infection, Veterinary and Ecological Sciences, University of Liverpool
Noura Al Moubayed: Department of Computer Science, Durham University

DOI: https://doi.org/10.1038/s41598-024-64551-1
Journal volume & issue: Vol. 14, no. 1
pp. 1 – 12

Abstract

Read online

Abstract As interest in using machine learning models to support clinical decision-making increases, explainability is an unequivocal priority for clinicians, researchers and regulators to comprehend and trust their results. With many clinical datasets containing a range of modalities, from the free-text of clinician notes to structured tabular data entries, there is a need for frameworks capable of providing comprehensive explanation values across diverse modalities. Here, we present a multimodal masking framework to extend the reach of SHapley Additive exPlanations (SHAP) to text and tabular datasets to identify risk factors for companion animal mortality in first-opinion veterinary electronic health records (EHRs) from across the United Kingdom. The framework is designed to treat each modality consistently, ensuring uniform and consistent treatment of features and thereby fostering predictability in unimodal and multimodal contexts. We present five multimodality approaches, with the best-performing method utilising PetBERT, a language model pre-trained on a veterinary dataset. Utilising our framework, we shed light for the first time on the reasons each model makes its decision and identify the inclination of PetBERT towards a more pronounced engagement with free-text narratives compared to BERT-base’s predominant emphasis on tabular data. The investigation also explores the important features on a more granular level, identifying distinct words and phrases that substantially influenced an animal’s life status prediction. PetBERT showcased a heightened ability to grasp phrases associated with veterinary clinical nomenclature, signalling the productivity of additional pre-training of language models.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal