The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge

Sören Auer; Dante A. C. Barone; Cassiano Bartz; Eduardo G. Cortes; Mohamad Yaser Jaradeh; Oliver Karras; Manolis Koubarakis; Dmitry Mouromtsev; Dmitrii Pliukhin; Daniil Radyush; Ivan Shilin; Markus Stocker; Eleni Tsalapati

doi:10.1038/s41598-023-33607-z

Scientific Reports (May 2023)

The SciQA Scientific Question Answering Benchmark for Scholarly Knowledge

Sören Auer,
Dante A. C. Barone,
Cassiano Bartz,
Eduardo G. Cortes,
Mohamad Yaser Jaradeh,
Oliver Karras,
Manolis Koubarakis,
Dmitry Mouromtsev,
Dmitrii Pliukhin,
Daniil Radyush,
Ivan Shilin,
Markus Stocker,
Eleni Tsalapati

Affiliations

Sören Auer: TIB—Leibniz Information Centre for Science and Technology
Dante A. C. Barone: Institute of Informatics, Federal University of Rio Grande do Sul
Cassiano Bartz: Institute of Informatics, Federal University of Rio Grande do Sul
Eduardo G. Cortes: Institute of Informatics, Federal University of Rio Grande do Sul
Mohamad Yaser Jaradeh: TIB—Leibniz Information Centre for Science and Technology
Oliver Karras: TIB—Leibniz Information Centre for Science and Technology
Manolis Koubarakis: Department of Informatics and Telecommunications, National and Kapodistrian University of Athens
Dmitry Mouromtsev: Laboratory of Information Science and Semantic Technologies, ITMO University
Dmitrii Pliukhin: Laboratory of Information Science and Semantic Technologies, ITMO University
Daniil Radyush: Laboratory of Information Science and Semantic Technologies, ITMO University
Ivan Shilin: Laboratory of Information Science and Semantic Technologies, ITMO University
Markus Stocker: TIB—Leibniz Information Centre for Science and Technology
Eleni Tsalapati: Department of Informatics and Telecommunications, National and Kapodistrian University of Athens

DOI: https://doi.org/10.1038/s41598-023-33607-z
Journal volume & issue: Vol. 13, no. 1
pp. 1 – 16

Abstract

Read online

Abstract Knowledge graphs have gained increasing popularity in the last decade in science and technology. However, knowledge graphs are currently relatively simple to moderate semantic structures that are mainly a collection of factual statements. Question answering (QA) benchmarks and systems were so far mainly geared towards encyclopedic knowledge graphs such as DBpedia and Wikidata. We present SciQA a scientific QA benchmark for scholarly knowledge. The benchmark leverages the Open Research Knowledge Graph (ORKG) which includes almost 170,000 resources describing research contributions of almost 15,000 scholarly articles from 709 research fields. Following a bottom-up methodology, we first manually developed a set of 100 complex questions that can be answered using this knowledge graph. Furthermore, we devised eight question templates with which we automatically generated further 2465 questions, that can also be answered with the ORKG. The questions cover a range of research fields and question types and are translated into corresponding SPARQL queries over the ORKG. Based on two preliminary evaluations, we show that the resulting SciQA benchmark represents a challenging task for next-generation QA systems. This task is part of the open competitions at the 22nd International Semantic Web Conference 2023 as the Scholarly Question Answering over Linked Data (QALD) Challenge.

Published in Scientific Reports

ISSN: 2045-2322 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Medicine; Science
Website: https://www.nature.com/srep/

About the journal