PLoS ONE (Jan 2021)

Mathematically aggregating experts' predictions of possible futures.

  • A M Hanea,
  • D P Wilkinson,
  • M McBride,
  • A Lyon,
  • D van Ravenzwaaij,
  • F Singleton Thorn,
  • C Gray,
  • D R Mandel,
  • A Willcox,
  • E Gould,
  • E T Smith,
  • F Mody,
  • M Bush,
  • F Fidler,
  • H Fraser,
  • B C Wintle

DOI
https://doi.org/10.1371/journal.pone.0256919
Journal volume & issue
Vol. 16, no. 9
p. e0256919

Abstract

Read online

Structured protocols offer a transparent and systematic way to elicit and combine/aggregate, probabilistic predictions from multiple experts. These judgements can be aggregated behaviourally or mathematically to derive a final group prediction. Mathematical rules (e.g., weighted linear combinations of judgments) provide an objective approach to aggregation. The quality of this aggregation can be defined in terms of accuracy, calibration and informativeness. These measures can be used to compare different aggregation approaches and help decide on which aggregation produces the "best" final prediction. When experts' performance can be scored on similar questions ahead of time, these scores can be translated into performance-based weights, and a performance-based weighted aggregation can then be used. When this is not possible though, several other aggregation methods, informed by measurable proxies for good performance, can be formulated and compared. Here, we develop a suite of aggregation methods, informed by previous experience and the available literature. We differentially weight our experts' estimates by measures of reasoning, engagement, openness to changing their mind, informativeness, prior knowledge, and extremity, asymmetry or granularity of estimates. Next, we investigate the relative performance of these aggregation methods using three datasets. The main goal of this research is to explore how measures of knowledge and behaviour of individuals can be leveraged to produce a better performing combined group judgment. Although the accuracy, calibration, and informativeness of the majority of methods are very similar, a couple of the aggregation methods consistently distinguish themselves as among the best or worst. Moreover, the majority of methods outperform the usual benchmarks provided by the simple average or the median of estimates.