Nature Communications (Nov 2024)

An automatic end-to-end chemical synthesis development platform powered by large language models

  • Yixiang Ruan,
  • Chenyin Lu,
  • Ning Xu,
  • Yuchen He,
  • Yixin Chen,
  • Jian Zhang,
  • Jun Xuan,
  • Jianzhang Pan,
  • Qun Fang,
  • Hanyu Gao,
  • Xiaodong Shen,
  • Ning Ye,
  • Qiang Zhang,
  • Yiming Mo

DOI
https://doi.org/10.1038/s41467-024-54457-x
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 16

Abstract

Read online

Abstract The rapid emergence of large language model (LLM) technology presents promising opportunities to facilitate the development of synthetic reactions. In this work, we leveraged the power of GPT-4 to build an LLM-based reaction development framework (LLM-RDF) to handle fundamental tasks involved throughout the chemical synthesis development. LLM-RDF comprises six specialized LLM-based agents, including Literature Scouter, Experiment Designer, Hardware Executor, Spectrum Analyzer, Separation Instructor, and Result Interpreter, which are pre-prompted to accomplish the designated tasks. A web application with LLM-RDF as the backend was built to allow chemist users to interact with automated experimental platforms and analyze results via natural language, thus, eliminating the need for coding skills and ensuring accessibility for all chemists. We demonstrated the capabilities of LLM-RDF in guiding the end-to-end synthesis development process for the copper/TEMPO catalyzed aerobic alcohol oxidation to aldehyde reaction, including literature search and information extraction, substrate scope and condition screening, reaction kinetics study, reaction condition optimization, reaction scale-up and product purification. Furthermore, LLM-RDF’s broader applicability and versability was validated on various synthesis tasks of three distinct reactions (SNAr reaction, photoredox C-C cross-coupling reaction, and heterogeneous photoelectrochemical reaction).