Large Language Models and Genomics for Summarizing the Role of microRNA in Regulating mRNA Expression

Balu Bhasuran; Sharanya Manoharan; Oviya Ramalakshmi Iyyappan; Gurusamy Murugesan; Archana Prabahar; Kalpana Raja

doi:10.3390/biomedicines12071535

Biomedicines (Jul 2024)

Large Language Models and Genomics for Summarizing the Role of microRNA in Regulating mRNA Expression

Balu Bhasuran,
Sharanya Manoharan,
Oviya Ramalakshmi Iyyappan,
Gurusamy Murugesan,
Archana Prabahar,
Kalpana Raja

Affiliations

Balu Bhasuran: School of Information, Florida State University, Tallahassee, FL 32306, USA
Sharanya Manoharan: Department of Bioinformatics, Stella Maris College, Chennai 600086, Tamil Nadu, India
Oviya Ramalakshmi Iyyappan: Department of Computer Science and Engineering, Amrita School of Computing, Amrita Vishwa Vidyapeetham, Chennai 641112, Tamil Nadu, India
Gurusamy Murugesan: Department of Computer Science and Engineering, Koneru Lakshmaiah Education Foundation, Green Fields, Guntur District, Vaddeswaram 522302, Andhra Pradesh, India
Archana Prabahar: Center for Gene Regulation in Health and Disease, Department of Biological, Geological, and Environmental Sciences (BGES), Cleveland State University, Cleveland, OH 44115, USA
Kalpana Raja: Department of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT 06510, USA

DOI: https://doi.org/10.3390/biomedicines12071535
Journal volume & issue: Vol. 12, no. 7
p. 1535

Abstract

Read online

microRNA (miRNA)–messenger RNA (mRNA or gene) interactions are pivotal in various biological processes, including the regulation of gene expression, cellular differentiation, proliferation, apoptosis, and development, as well as the maintenance of cellular homeostasis and pathogenesis of numerous diseases, such as cancer, cardiovascular diseases, neurological disorders, and metabolic conditions. Understanding the mechanisms of miRNA–mRNA interactions can provide insights into disease mechanisms and potential therapeutic targets. However, extracting these interactions efficiently from a huge collection of published articles in PubMed is challenging. In the current study, we annotated a miRNA–mRNA Interaction Corpus (MMIC) and used it for evaluating the performance of a variety of machine learning (ML) models, deep learning-based transformer (DLT) models, and large language models (LLMs) in extracting the miRNA–mRNA interactions mentioned in PubMed. We used the genomics approaches for validating the extracted miRNA–mRNA interactions. Among the ML, DLT, and LLM models, PubMedBERT showed the highest precision, recall, and F-score, with all equal to 0.783. Among the LLM models, the performance of Llama-2 is better when compared to others. Llama 2 achieved 0.56 precision, 0.86 recall, and 0.68 F-score in a zero-shot experiment and 0.56 precision, 0.87 recall, and 0.68 F-score in a three-shot experiment. Our study shows that Llama 2 achieves better recall than ML and DLT models and leaves space for further improvement in terms of precision and F-score.

Published in Biomedicines

ISSN: 2227-9059 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Science: Biology (General)
Website: http://www.mdpi.com/journal/biomedicines

About the journal

Abstract

Keywords