Utilizing a domain-specific large language model for LI-RADS v2018 categorization of free-text MRI reports: a feasibility study

Mario Matute-González; Anna Darnell; Marc Comas-Cufí; Javier Pazó; Alexandre Soler; Belén Saborido; Ezequiel Mauro; Juan Turnes; Alejandro Forner; María Reig; Jordi Rimola

doi:10.1186/s13244-024-01850-1

Insights into Imaging (Nov 2024)

Utilizing a domain-specific large language model for LI-RADS v2018 categorization of free-text MRI reports: a feasibility study

Mario Matute-González,
Anna Darnell,
Marc Comas-Cufí,
Javier Pazó,
Alexandre Soler,
Belén Saborido,
Ezequiel Mauro,
Juan Turnes,
Alejandro Forner,
María Reig,
Jordi Rimola

Affiliations

Mario Matute-González: BCLC Group, Radiology Department, Hospital Clínic of Barcelona, IDIBAPS
Anna Darnell: BCLC Group, Radiology Department, Hospital Clínic of Barcelona, IDIBAPS
Marc Comas-Cufí: Computer Science, Applied Mathematics and Statistics Department, University of Girona
Javier Pazó: Information Technology Department, Spanish Association for the Study of the Liver
Alexandre Soler: BCLC Group, Radiology Department, Hospital Clínic of Barcelona, IDIBAPS
Belén Saborido: BCLC Group, Fundació Clínic per la Recerca Biomèdica—IDIBAPS
Ezequiel Mauro: BCLC Group, Liver Unit, Hospital Clínic of Barcelona, Fundació Clínic per a la Recerca Biomédica (FCRB), IDIBAPS, University of Barcelona
Juan Turnes: Gastroenterology and Hepatology, Pontevedra University Hospital Complex
Alejandro Forner: BCLC Group, Liver Unit, Hospital Clínic of Barcelona, Fundació Clínic per a la Recerca Biomédica (FCRB), IDIBAPS, University of Barcelona
María Reig: BCLC Group, Liver Unit, Hospital Clínic of Barcelona, Fundació Clínic per a la Recerca Biomédica (FCRB), IDIBAPS, University of Barcelona
Jordi Rimola: BCLC Group, Radiology Department, Hospital Clínic of Barcelona, IDIBAPS

DOI: https://doi.org/10.1186/s13244-024-01850-1
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Objective To develop a domain-specific large language model (LLM) for LI-RADS v2018 categorization of hepatic observations based on free-text descriptions extracted from MRI reports. Material and methods This retrospective study included 291 small liver observations, divided into training (n = 141), validation (n = 30), and test (n = 120) datasets. Of these, 120 were fictitious, and 171 were extracted from 175 MRI reports from a single institution. The algorithm’s performance was compared to two independent radiologists and one hepatologist in a human replacement scenario, and considering two combined strategies (double reading with arbitration and triage). Agreement on LI-RADS category and dichotomic malignancy (LR-4, LR-5, and LR-M) were estimated using linear-weighted κ statistics and Cohen’s κ, respectively. Sensitivity and specificity for LR-5 were calculated. The consensus agreement of three other radiologists served as the ground truth. Results The model showed moderate agreement against the ground truth for both LI-RADS categorization (κ = 0.54 [95% CI: 0.42–0.65]) and the dichotomized approach (κ = 0.58 [95% CI: 0.42–0.73]). Sensitivity and specificity for LR-5 were 0.76 (95% CI: 0.69–0.86) and 0.96 (95% CI: 0.91–1.00), respectively. When the chatbot was used as a triage tool, performance improved for LI-RADS categorization (κ = 0.86/0.87 for the two independent radiologists and κ = 0.76 for the hepatologist), dichotomized malignancy (κ = 0.94/0.91 and κ = 0.87) and LR-5 identification (1.00/0.98 and 0.85 sensitivity, 0.96/0.92 and 0.92 specificity), with no statistical significance compared to the human readers’ individual performance. Through this strategy, the workload decreased by 45%. Conclusion LI-RADS v2018 categorization from unlabelled MRI reports is feasible using our LLM, and it enhances the efficiency of data curation. Critical relevance statement Our proof-of-concept study provides novel insights into the potential applications of LLMs, offering a real-world example of how these tools could be integrated into a local workflow to optimize data curation for research purposes. Key Points Automatic LI-RADS categorization from free-text reports would be beneficial to workflow and data mining. LiverAI, a GPT-4-based model, supported various strategies improving data curation efficiency by up to 60%. LLMs can integrate into workflows, significantly reducing radiologists’ workload. Graphical Abstract

Published in Insights into Imaging

ISSN: 1869-4101 (Online)
Publisher: SpringerOpen
Country of publisher: Germany
LCC subjects: Medicine: Medicine (General): Medical physics. Medical radiology. Nuclear medicine
Website: http://www.springer.com/13244

About the journal

Abstract

Keywords