Patterns (May 2022)
Reliance on metrics is a fundamental challenge for AI
Abstract
Summary: Through a series of case studies, we review how the unthinking pursuit of metric optimization can lead to real-world harms, including recommendation systems promoting radicalization, well-loved teachers fired by an algorithm, and essay grading software that rewards sophisticated garbage. The metrics used are often proxies for underlying, unmeasurable quantities (e.g., “watch time” of a video as a proxy for “user satisfaction”). We propose an evidence-based framework to mitigate such harms by (1) using a slate of metrics to get a fuller and more nuanced picture; (2) conducting external algorithmic audits; (3) combining metrics with qualitative accounts; and (4) involving a range of stakeholders, including those who will be most impacted. The bigger picture: The success of current artificial intelligence (AI) approaches such as deep learning centers on their unreasonable effectiveness at metric optimization, yet overemphasizing metrics leads to a variety of real-world harms, including manipulation, gaming, and a myopic focus on short-term qualities and inadequate proxies. This principle is classically captured in Goodhart’s law: when a measure becomes the target, it ceases to be an effective measure. Current AI approaches have we aponized Goodhart’s law by centering on optimizing a particular measure as a target. This poses a grand contradiction within AI design and ethics: optimizing metrics results in far from optimal outcomes. It is crucial to understand this dynamic in order to mitigate the risks and harms we are facing as a result of misuse of AI.