PLOS Digital Health (Jul 2025)

Using large language models to extract information from pediatric clinical reports.

  • Katharina Danhauser,
  • Yingding Wang,
  • Christoph Klein,
  • Uta Tacke,
  • Larissa Mantoan,
  • Laura Aurica Ritter,
  • Florian Heinen,
  • Chiara Nobile,
  • Moritz Tacke

DOI
https://doi.org/10.1371/journal.pdig.0000919
Journal volume & issue
Vol. 4, no. 7
p. e0000919

Abstract

Read online

Most medical documentation, including clinical reports, exists in unstructured formats, which hinder efficient data analysis and integration into decision-making systems for patient care and research. Both fields could profit significantly from a reliable automatic analysis of these documents. Current methods for data extraction from these documents are labor-intensive and inflexible. Large Language Models (LLMs) offer a promising alternative for transforming unstructured medical documents into structured data in a flexible manner. This study assesses the performance of large language models (LLMs) in extracting structured data from pediatric clinical reports. Nine different LLMs were assessed. The results demonstrate that both commercial and open-source LLMs can achieve high accuracy in identifying patient-specific information, with top-performing models achieving over 90% accuracy in key tasks.