Achieving GPT-4o level performance in astronomy with a specialized 8B-parameter large language model

Tijmen de Haan; Yuan-Sen Ting; Tirthankar Ghosal; Tuan Dung Nguyen; Alberto Accomazzi; Azton Wells; Nesar Ramachandra; Rui Pan; Zechang Sun

doi:10.1038/s41598-025-97131-y

Scientific Reports (Apr 2025)

Achieving GPT-4o level performance in astronomy with a specialized 8B-parameter large language model

Tijmen de Haan,
Yuan-Sen Ting,
Tirthankar Ghosal,
Tuan Dung Nguyen,
Alberto Accomazzi,
Azton Wells,
Nesar Ramachandra,
Rui Pan,
Zechang Sun

Affiliations

Tijmen de Haan: International Center for Quantum-field Measurement Systems for Studies of the Universe and Particles (QUP-WPI), High Energy Accelerator Research Organization (KEK)
Yuan-Sen Ting: Department of Astronomy, The Ohio State University
Tirthankar Ghosal: National Center for Computational Sciences, Oak Ridge National Laboratory
Tuan Dung Nguyen: Department of Computer and Information Science, University of Pennsylvania
Alberto Accomazzi: Center for Astrophysics, Harvard & Smithsonian
Azton Wells: Computational Science Division, Argonne National Laboratory
Nesar Ramachandra: Computational Science Division, Argonne National Laboratory
Rui Pan: Department of Computer Science and Engineering, Hong Kong University of Science and Technology
Zechang Sun: Department of Astronomy, Tsinghua University

DOI: https://doi.org/10.1038/s41598-025-97131-y
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 10

Abstract

Read online

Abstract AstroSage-Llama-3.1-8B is a domain-specialized natural-language AI assistant tailored for research in astronomy, astrophysics, cosmology, and astronomical instrumentation. Trained on the complete collection of astronomy-related arXiv papers from 2007 to 2024 along with millions of synthetically-generated question-answer pairs and other astronomical literature, AstroSage-Llama-3.1-8B demonstrates remarkable proficiency on a wide range of questions. AstroSage-Llama-3.1-8B scores 80.9% on the AstroMLab-1 benchmark, greatly outperforming all models—proprietary and open-weight—in the 8-billion parameter class, and performing on par with GPT-4o. This achievement demonstrates the potential of domain specialization in AI, suggesting that focused training can yield capabilities exceeding those of much larger, general-purpose models. AstroSage-Llama-3.1-8B is freely available, enabling widespread access to advanced AI capabilities for astronomical education and research.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal

Abstract

Keywords