Frontiers in Education (Aug 2024)
Developing valid assessments in the era of generative artificial intelligence
Abstract
Generative Artificial Intelligence (GAI) holds tremendous potential to transform the field of education because GAI models can consider context and therefore can be trained to deliver quick and meaningful evaluation of student learning outcomes. However, current versions of GAI tools have considerable limitations, such as social biases often inherent in the data sets used to train the models. Moreover, the GAI revolution comes during a period of moving away from memorization-based education systems toward supporting learners in developing the ability to apply knowledge and skills to solve real-world problems and explain real-world phenomena. A challenge in using GAI tools for scoring assessments aimed at fostering knowledge application is ensuring that these algorithms are scoring the same construct attributes (e.g., knowledge and skills) as a trained human scorer would score when evaluating student performance. Similarly, if using GAI tools to develop assessments, one needs to ensure that the goals of GAI-generated assessments are aligned with the vision and performance expectations of the learning environments for which these assessments are developed. Currently, no guidelines have been identified for assessing the validity of AI-based assessments and assessment results. This paper represents a conceptual analysis of issues related to developing and validating GAI-based assessments and assessment results to guide the learning process. Our primary focus is to investigate how to meaningfully leverage capabilities of GAI for developing assessments. We propose ways to evaluate the validity evidence of GAI-produced assessments and assessment scores based on existing validation approaches. We discuss future research avenues aimed at establishing guidelines and methodologies for assessing the validity of AI-based assessments and assessment results. We ground our discussion in the theory of validity outlined in the Standards for Educational and Psychological Testing by the American Educational Research Association and discuss how we envision building on the standards for establishing the validity of inferences made from the test scores in the context of GAI-based assessments.
Keywords