Scientific Reports (Apr 2025)

Achieving GPT-4o level performance in astronomy with a specialized 8B-parameter large language model

  • Tijmen de Haan,
  • Yuan-Sen Ting,
  • Tirthankar Ghosal,
  • Tuan Dung Nguyen,
  • Alberto Accomazzi,
  • Azton Wells,
  • Nesar Ramachandra,
  • Rui Pan,
  • Zechang Sun

DOI
https://doi.org/10.1038/s41598-025-97131-y
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 10

Abstract

Read online

Abstract AstroSage-Llama-3.1-8B is a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, cosmology, and astronomical instrumentation. Trained on the complete collection of astronomy-related arXiv papers from 2007 to 2024 along with millions of synthetically-generated question-answer pairs and other astronomical literature, AstroSage-Llama-3.1-8B demonstrates remarkable proficiency on a wide range of questions. AstroSage-Llama-3.1-8B scores 80.9% on the AstroMLab-1 benchmark, greatly outperforming all models—proprietary and open-weight—in the 8-billion parameter class, and performing on par with GPT-4o. This achievement demonstrates the potential of domain specialization in AI, suggesting that focused training can yield capabilities exceeding those of much larger, general-purpose models. AstroSage-Llama-3.1-8B is freely available, enabling widespread access to advanced AI capabilities for astronomical education and research.

Keywords