PeMeBench: Chinese pediatric medical Q&amp;A benchmark testing method

ZHANG Qian; CHEN Panfeng; FENG Linkun; LIU Shuyu; MA Dan; CHEN Mei; LI Hui

大数据 (Sep 2024)

PeMeBench: Chinese pediatric medical Q&A benchmark testing method

ZHANG Qian,
CHEN Panfeng,
FENG Linkun,
LIU Shuyu,
MA Dan,
CHEN Mei,
LI Hui

Affiliations

ZHANG Qian
CHEN Panfeng
FENG Linkun
LIU Shuyu
MA Dan
CHEN Mei
LI Hui

Journal volume & issue: Vol. 10
pp. 28 – 44

Abstract

Read online

Large language model (LLM) has demonstrated significant application potential in the medical field. However, evaluating the performance of LLM in medical scenarios poses a challenge. Existing medical benchmarks, predominantly in the form of multiple-choice questions, struggle to comprehensively and accurately assess LLM's performance in pediatric domains. To address this issue, PeMeBench, the first Chinese pediatric question-answering benchmark, was proposed. Leveraging a dual-perspective evaluation dimensions and referencing diagnostic and treatment guidelines from 10 pediatric disease systems, PeMeBench meticulously categorized pediatric medical question-answering tasks into five subdomains: disease knowledge, treatment plans, medication dosages, disease prevention, and pharmacological effects. It comprised over 10 000 open-ended question-answering items and introduced a multi-grained automated evaluation scheme that integrated entity retrieval with the detection of hallucinated sentences. This approach aimed to provide a comprehensive and precise assessment of LLM's performance in pediatric healthcare, delving into their potential limitations and laying a solid foundation for enhancing the intelligence level of medical services.

pediatric medicine;benchmark testing;large language model;Q&A

Published in 大数据

ISSN: 2096-0271 (Print)
Publisher: China InfoCom Media Group
Country of publisher: China
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://www.infocomm-journal.com/bdr/EN/2096-0271/home.shtml

About the journal

Abstract

Keywords