Evaluatology: The science and engineering of evaluation

Jianfeng Zhan; Lei Wang; Wanling Gao; Hongxiao Li; Chenxi Wang; Yunyou Huang; Yatao Li; Zhengxin Yang; Guoxin Kang; Chunjie Luo; Hainan Ye; Shaopeng Dai; Zhifei Zhang

BenchCouncil Transactions on Benchmarks, Standards and Evaluations (Mar 2024)

Evaluatology: The science and engineering of evaluation

Jianfeng Zhan,
Lei Wang,
Wanling Gao,
Hongxiao Li,
Chenxi Wang,
Yunyou Huang,
Yatao Li,
Zhengxin Yang,
Guoxin Kang,
Chunjie Luo,
Hainan Ye,
Shaopeng Dai,
Zhifei Zhang

Affiliations

Jianfeng Zhan: The International Open Benchmark Council, Delaware, USA; ICT, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China; Corresponding author.
Lei Wang: ICT, Chinese Academy of Sciences, Beijing, China; The International Open Benchmark Council, Delaware, USA; University of Chinese Academy of Sciences, Beijing, China
Wanling Gao: ICT, Chinese Academy of Sciences, Beijing, China; The International Open Benchmark Council, Delaware, USA; University of Chinese Academy of Sciences, Beijing, China
Hongxiao Li: ICT, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Chenxi Wang: ICT, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Yunyou Huang: Guangxi Normal University, Guilin, Guangxi, China; The International Open Benchmark Council, Delaware, USA
Yatao Li: Microsoft Research Asia, Beijing, China
Zhengxin Yang: ICT, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Guoxin Kang: ICT, Chinese Academy of Sciences, Beijing, China; The International Open Benchmark Council, Delaware, USA; University of Chinese Academy of Sciences, Beijing, China
Chunjie Luo: ICT, Chinese Academy of Sciences, Beijing, China; The International Open Benchmark Council, Delaware, USA; University of Chinese Academy of Sciences, Beijing, China
Hainan Ye: Hong Kong International Evaluation and Benchmark Research, Hong Kong, China; ICT, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China
Shaopeng Dai: University of Chinese Academy of Sciences, Beijing, China
Zhifei Zhang: Capital Medical University, Beijing, China; The International Open Benchmark Council, Delaware, USA

Journal volume & issue: Vol. 4, no. 1
p. 100162

Abstract

Read online

Evaluation is a crucial aspect of human existence and plays a vital role in each field. However, it is often approached in an empirical and ad-hoc manner, lacking consensus on universal concepts, terminologies, theories, and methodologies. This lack of agreement has significant consequences. This article aims to formally introduce the discipline of evaluatology, which encompasses the science and engineering of evaluation. We propose a universal framework for evaluation, encompassing concepts, terminologies, theories, and methodologies that can be applied across various disciplines, if not all disciplines.Our research reveals that the essence of evaluation lies in conducting experiments that intentionally apply a well-defined evaluation condition to individuals or systems under scrutiny, which we refer to as the subjects. This process allows for the creation of an evaluation system or model. By measuring and/or testing this evaluation system or model, we can infer the impact of different subjects. Derived from the essence of evaluation, we propose five axioms focusing on key aspects of evaluation outcomes as the foundational evaluation theory. These axioms serve as the bedrock upon which we build universal evaluation theories and methodologies. When evaluating a single subject, it is crucial to create evaluation conditions with different levels of equivalency. By applying these conditions to diverse subjects, we can establish reference evaluation models. These models allow us to alter a single independent variable at a time while keeping all other variables as controls. When evaluating complex scenarios, the key lies in establishing a series of evaluation models that maintain transitivity. Building upon the science of evaluation, we propose a formal definition of a benchmark as a simplified and sampled evaluation condition that guarantees different levels of equivalency. This concept serves as the cornerstone for a universal benchmark-based engineering approach to evaluation across various disciplines, which we refer to as benchmarkology.

Published in BenchCouncil Transactions on Benchmarks, Standards and Evaluations

ISSN: 2772-4859 (Online)
Publisher: KeAi Communications Co. Ltd.
Country of publisher: China
LCC subjects: Science; Technology: Engineering (General). Civil engineering (General)
Website: https://www.keaipublishing.com/en/journals/benchcouncil-transactions-on-benchmarks-standards-and-evaluations/

About the journal

Abstract

Keywords