SEETrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials

Kyeryoung Lee; Hunki Paek; Liang-Chin Huang; C Beau Hilton; Surabhi Datta; Josh Higashi; Nneka Ofoegbu; Jingqi Wang; Samuel M. Rubinstein; Andrew J. Cowan; Mary Kwok; Jeremy L. Warner; Hua Xu; Xiaoyan Wang

Informatics in Medicine Unlocked (Jan 2024)

SEETrials: Leveraging large language models for safety and efficacy extraction in oncology clinical trials

Kyeryoung Lee,
Hunki Paek,
Liang-Chin Huang,
C Beau Hilton,
Surabhi Datta,
Josh Higashi,
Nneka Ofoegbu,
Jingqi Wang,
Samuel M. Rubinstein,
Andrew J. Cowan,
Mary Kwok,
Jeremy L. Warner,
Hua Xu,
Xiaoyan Wang

Affiliations

Kyeryoung Lee: IMO Health, Rosemont, IL, USA
Hunki Paek: IMO Health, Rosemont, IL, USA
Liang-Chin Huang: IMO Health, Rosemont, IL, USA
C Beau Hilton: Division of Hematology and Oncology, Vanderbilt University, Nashville, TN, USA
Surabhi Datta: IMO Health, Rosemont, IL, USA
Josh Higashi: IMO Health, Rosemont, IL, USA
Nneka Ofoegbu: IMO Health, Rosemont, IL, USA
Jingqi Wang: IMO Health, Rosemont, IL, USA
Samuel M. Rubinstein: Division of Hematology, University of North Carolina, Chapel Hill, NC, USA
Andrew J. Cowan: Division of Hematology and Oncology, University of Washington, Seattle, WA, USA
Mary Kwok: Division of Hematology and Oncology, University of Washington, Seattle, WA, USA
Jeremy L. Warner: Lifespan Cancer Institute, Rhode Island Hospital, Providence, RI, USA; Center for Clinical Cancer Informatics and Data Science, Legorreta Cancer Center, Brown University, Providence, RI, USA
Hua Xu: Biomedical Informatics and Data Science, Yale University, New Haven, CT, USA
Xiaoyan Wang: IMO Health, Rosemont, IL, USA; Corresponding author. IMO health, 9600 West Bryn Mawr Avenue, Suite 100, Rosemont, IL, 60018 USA.

Journal volume & issue: Vol. 50
p. 101589

Abstract

Read online

Background: Initial insights into oncology clinical trial outcomes are often gleaned manually from conference abstracts. We aimed to develop an automated system to extract safety and efficacy information from study abstracts with high precision and fine granularity, transforming them into computable data for timely clinical decision-making. Methods: We collected clinical trial abstracts from key conferences and PubMed (2012–2023). The SEETrials system was developed with three modules: preprocessing, prompt engineering with knowledge ingestion, and postprocessing. We evaluated the system's performance qualitatively and quantitatively and assessed its generalizability across different cancer types— multiple myeloma (MM), breast, lung, lymphoma, and leukemia. Furthermore, the efficacy and safety of innovative therapies, including CAR-T, bispecific antibodies, and antibody-drug conjugates (ADC), in MM were analyzed across a large scale of clinical trial studies. Results: SEETrials achieved high precision (0.964), recall (sensitivity) (0.988), and F1 score (0.974) across 70 data elements present in the MM trial studies Generalizability tests on four additional cancers yielded precision, recall, and F1 scores within the 0.979–0.992 range. Variation in the distribution of safety and efficacy-related entities was observed across diverse therapies, with certain adverse events more common in specific treatments. Comparative performance analysis using overall response rate (ORR) and complete response (CR) highlighted differences among therapies: CAR-T (ORR: 88 %, 95 % CI: 84–92 %; CR: 95 %, 95 % CI: 53–66 %), bispecific antibodies (ORR: 64 %, 95 % CI: 55–73 %; CR: 27 %, 95 % CI: 16–37 %), and ADC (ORR: 51 %, 95 % CI: 37–65 %; CR: 26 %, 95 % CI: 1–51 %). Notable study heterogeneity was identified (>75 % I2 heterogeneity index scores) across several outcome entities analyzed within therapy subgroups. Conclusion: SEETrials demonstrated highly accurate data extraction and versatility across different therapeutics and various cancer domains. Its automated processing of large datasets facilitates nuanced data comparisons, promoting the swift and effective dissemination of clinical insights.

Published in Informatics in Medicine Unlocked

ISSN: 2352-9148 (Online)
Publisher: Elsevier
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.journals.elsevier.com/informatics-in-medicine-unlocked/

About the journal

Abstract

Keywords