Computers and Education: Artificial Intelligence (Jan 2022)

Appraisal of high-stake examinations during SARS-CoV-2 emergency with responsible and transparent AI: Evidence of fair and detrimental assessment

  • MD. Rayhan,
  • MD. Golam Rabiul Alam,
  • M. Ali Akber Dewan,
  • M. Helal Uddin Ahmed

Journal volume & issue
Vol. 3
p. 100077

Abstract

Read online

In situations like the coronavirus pandemic, colleges and universities are forced to limit their offline and regular academic activities. Extended postponement of high-stakes exams due to health risk hereby reduces productivity and progress in later years. Several countries decided to organize the exams online. Since many other countries with large education boards had an inadequate infrastructure and insufficient resources during the emergency, education policy experts considered a solution to simultaneously protect public health and fully resume high-stakes exams -by canceling offline exam and introducing a uniform assessment process to be followed across the states and education boards. This research proposes a novel system using an AI model to accomplish the complex task of evaluating all students across education boards with maximum level of fairness and analyzes the ability to fairly appraise exam grades in the context of high-stakes examinations during SARS-CoV-2 emergency. Basically, a logistic regression classifier on top of a deep neural network is used to output predictions that are as fair as possible for all learners. The predictions of the proposed grade-awarding system are explained by the SHAP (SHapley Additive exPlanations) framework. SHAP allowed to identify the features of the students' portfolios that contributed most to the predicted grades. In the setting of an empirical analysis in one of the largest education systems in the Global South, 81.85% of learners were assigned fair scores while 3.12% of the scores were significantly smaller than the actual grades, which would have had a detrimental effect if it had been applied for real. Furthermore, SHAP allows policy-makers to debug the predictive model by identifying and measuring the importance of the factors involved in the model's final decision and removing those features that should not play a role in the model's “reasoning” process.

Keywords