CancerGPT for few shot drug pair synergy prediction using large pretrained language models

Tianhao Li; Sandesh Shetty; Advaith Kamath; Ajay Jaiswal; Xiaoqian Jiang; Ying Ding; Yejin Kim

doi:10.1038/s41746-024-01024-9

npj Digital Medicine (Feb 2024)

CancerGPT for few shot drug pair synergy prediction using large pretrained language models

Tianhao Li,
Sandesh Shetty,
Advaith Kamath,
Ajay Jaiswal,
Xiaoqian Jiang,
Ying Ding,
Yejin Kim

Affiliations

Tianhao Li: School of Information, University of Texas at Austin
Sandesh Shetty: Manning College of Information and Computer Sciences, University of Massachusetts Amherst
Advaith Kamath: Department of Chemical Engineering, University of Texas at Austin
Ajay Jaiswal: School of Information, University of Texas at Austin
Xiaoqian Jiang: McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston
Ying Ding: School of Information, University of Texas at Austin
Yejin Kim: McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston

DOI: https://doi.org/10.1038/s41746-024-01024-9
Journal volume & issue: Vol. 7, no. 1
pp. 1 – 10

Abstract

Read online

Abstract Large language models (LLMs) have been shown to have significant potential in few-shot learning across various fields, even with minimal training data. However, their ability to generalize to unseen tasks in more complex fields, such as biology and medicine has yet to be fully evaluated. LLMs can offer a promising alternative approach for biological inference, particularly in cases where structured data and sample size are limited, by extracting prior knowledge from text corpora. Here we report our proposed few-shot learning approach, which uses LLMs to predict the synergy of drug pairs in rare tissues that lack structured data and features. Our experiments, which involved seven rare tissues from different cancer types, demonstrate that the LLM-based prediction model achieves significant accuracy with very few or zero samples. Our proposed model, the CancerGPT (with ~ 124M parameters), is comparable to the larger fine-tuned GPT-3 model (with ~ 175B parameters). Our research contributes to tackling drug pair synergy prediction in rare tissues with limited data, and also advancing the use of LLMs for biological and medical inference tasks.

Published in npj Digital Medicine

ISSN: 2398-6352 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics
Website: https://www.nature.com/npjdigitalmed/

About the journal