Leveraging large language models for patient-ventilator asynchrony detection

Lluis Blanch; Francesc Suñol; Candelaria de Haro; Verónica Santos-Pulpón; Sol Fernández-Gonzalo; Josefina López-Aguilar; Leonardo Sarlabous

doi:10.1136/bmjhci-2024-101426

BMJ Health & Care Informatics (Jun 2025)

Leveraging large language models for patient-ventilator asynchrony detection

Lluis Blanch,
Francesc Suñol,
Candelaria de Haro,
Verónica Santos-Pulpón,
Sol Fernández-Gonzalo,
Josefina López-Aguilar,
Leonardo Sarlabous

Affiliations

Lluis Blanch: 3 Critical Care Center, Hospital Universitari Parc Taulí, Institut d’Investigació i Innovació Parc Taulí I3PT, Universitat Autònoma de Barcelona, Sabadell, Spain
Francesc Suñol: Institut d’Investigació i Innovació Parc Taulí (I3PT-CERCA), Sabadell, Spain
Candelaria de Haro: Critical Care Department, Institut d’Investigació I Innovació Parc Taulí (I3PT-CERCA), Sabadell, Spain
Verónica Santos-Pulpón: Institut d’Investigació i Innovació Parc Taulí (I3PT-CERCA), Sabadell, Spain
Sol Fernández-Gonzalo: Institut d’Investigació i Innovació Parc Taulí (I3PT-CERCA), Sabadell, Spain
Josefina López-Aguilar: Institut d’Investigació i Innovació Parc Taulí (I3PT-CERCA), Sabadell, Spain
Leonardo Sarlabous: Institut d’Investigació i Innovació Parc Taulí (I3PT-CERCA), Sabadell, Spain

DOI: https://doi.org/10.1136/bmjhci-2024-101426
Journal volume & issue: Vol. 32, no. 1

Abstract

Read online

Objectives The objective of this study is to evaluate whether large language models (LLMs) can achieve performance comparable to expert-developed deep neural networks in detecting flow starvation (FS) asynchronies during mechanical ventilation.Methods Popular LLMs (GPT-4, Claude-3.5, Gemini-1.5, DeepSeek-R1) were tested on a dataset of 6500 airway pressure cycles from 28 patients, classifying breaths into three FS categories. They were also tasked with generating executable code for one-dimensional convolutional neural network (CNN-1D) and Long Short-Term Memory networks. Model performances were assessed using repeated holdout validation and compared with expert-developed models.Results LLMs performed poorly in direct FS classification (accuracy: GPT-4: 0.497; Claude-3.5: 0.627; Gemini-1.5: 0.544, DeepSeek-R1: 0.520). However, Claude-3.5-generated CNN-1D code achieved the highest accuracy (0.902 (0.899–0.906)), outperforming expert-developed models.Discussion LLMs demonstrated limited capability in direct classification but excelled in generating effective neural network models with minimal human intervention. This suggests LLMs’ potential in accelerating model development for clinical applications, particularly for detecting patient-ventilator asynchronies, though their clinical implementation requires further validation and consideration of ethical factors.

Published in BMJ Health & Care Informatics

ISSN: 2632-1009 (Online)
Publisher: BMJ Publishing Group
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://informatics.bmj.com/

About the journal