BioASQ-QA: A manually curated corpus for Biomedical Question Answering

Anastasia Krithara; Anastasios Nentidis; Konstantinos Bougiatiotis; Georgios Paliouras

doi:10.1038/s41597-023-02068-4

Scientific Data (Mar 2023)

BioASQ-QA: A manually curated corpus for Biomedical Question Answering

Anastasia Krithara,
Anastasios Nentidis,
Konstantinos Bougiatiotis,
Georgios Paliouras

Affiliations

Anastasia Krithara: Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”
Anastasios Nentidis: Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”
Konstantinos Bougiatiotis: Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”
Georgios Paliouras: Institute of Informatics and Telecommunications, National Center for Scientific Research “Demokritos”

DOI: https://doi.org/10.1038/s41597-023-02068-4
Journal volume & issue: Vol. 10, no. 1
pp. 1 – 12

Abstract

Read online

Abstract The BioASQ question answering (QA) benchmark dataset contains questions in English, along with golden standard (reference) answers and related material. The dataset has been designed to reflect real information needs of biomedical experts and is therefore more realistic and challenging than most existing datasets. Furthermore, unlike most previous QA benchmarks that contain only exact answers, the BioASQ-QA dataset also includes ideal answers (in effect summaries), which are particularly useful for research on multi-document summarization. The dataset combines structured and unstructured data. The materials linked with each question comprise documents and snippets, which are useful for Information Retrieval and Passage Retrieval experiments, as well as concepts that are useful in concept-to-text Natural Language Generation. Researchers working on paraphrasing and textual entailment can also measure the degree to which their methods improve the performance of biomedical QA systems. Last but not least, the dataset is continuously extended, as the BioASQ challenge is running and new data are generated.

Published in Scientific Data

ISSN: 2052-4463 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/sdata/

About the journal