Current Directions in Biomedical Engineering (Dec 2024)
Using Large Language Models for Extracting Structured Information From Scientific Texts
Abstract
Extracting structured information from scientific works is challenging as sought parameters or properties are often scattered across lengthy texts. We introduce a novel iterative approach using Large Language Models (LLMs) to automate this process. Our method first condenses scientific literature, preserving essential information in a dense format, then retrieves predefined attributes. As a biomedical application example, our concept is employed to extract experimental parameters for preparing Metal-Organic Frameworks (MOFs) from scientific work to enable complex and information-rich applications in the biotechnology-oriented life sciences. Our open-source method automates extracting information from verbose texts, converting them into structured and easily navigable data. This considerably improves scientific literature research by utilizing the power of LLMs and paves the way for enhanced and faster information extraction from extensive scientific texts.
Keywords