Genome Biology (May 2024)

Commonly used software tools produce conflicting and overly-optimistic AUPRC values

  • Wenyu Chen,
  • Chen Miao,
  • Zhenghao Zhang,
  • Cathy Sin-Hang Fung,
  • Ran Wang,
  • Yizhen Chen,
  • Yan Qian,
  • Lixin Cheng,
  • Kevin Y. Yip,
  • Stephen Kwok-Wing Tsui,
  • Qin Cao

DOI
https://doi.org/10.1186/s13059-024-03266-y
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 12

Abstract

Read online

Abstract The precision-recall curve (PRC) and the area under the precision-recall curve (AUPRC) are useful for quantifying classification performance. They are commonly used in situations with imbalanced classes, such as cancer diagnosis and cell type annotation. We evaluate 10 popular tools for plotting PRC and computing AUPRC, which were collectively used in more than 3000 published studies. We find the AUPRC values computed by the tools rank classifiers differently and some tools produce overly-optimistic results.