PLoS ONE (Jan 2013)

Substantial agreement of referee recommendations at a general medical journal--a peer review evaluation at Deutsches Ärzteblatt International.

  • Christopher Baethge,
  • Jeremy Franklin,
  • Stephan Mertens

DOI
https://doi.org/10.1371/journal.pone.0061401
Journal volume & issue
Vol. 8, no. 5
p. e61401

Abstract

Read online

BACKGROUND: Peer review is the mainstay of editorial decision making for medical journals. There is a dearth of evaluations of journal peer review with regard to reliability and validity, particularly in the light of the wide variety of medical journals. Studies carried out so far indicate low agreement among reviewers. We present an analysis of the peer review process at a general medical journal, Deutsches Ärzteblatt International. METHODOLOGY/PRINCIPAL FINDINGS: 554 reviewer recommendations on 206 manuscripts submitted between 7/2008 and 12/2009 were analyzed: 7% recommended acceptance, 74% revision and 19% rejection. Concerning acceptance (with or without revision) versus rejection, there was a substantial agreement among reviewers (74.3% of pairs of recommendations) that was not reflected by Fleiss' or Cohen's kappa (<0.2). The agreement rate amounted to 84% for acceptance, but was only 31% for rejection. An alternative kappa-statistic, however, Gwet's kappa (AC1), indicated substantial agreement (0.63). Concordance between reviewer recommendation and editorial decision was almost perfect when reviewer recommendations were unanimous. The correlation of reviewer recommendations and citations as counted by Web of Science was low (partial correlation adjusted for year of publication: -0.03, n.s.). CONCLUSIONS/SIGNIFICANCE: Although our figures are similar to those reported in the literature our conclusion differs from the widely held view that reviewer agreement is low: Based on overall agreement we consider the concordance among reviewers sufficient for the purposes of editorial decision making. We believe that various measures, such as positive and negative agreement or alternative Kappa values are superior to the application of Cohen's or Fleiss' Kappa in the analysis of nominal or ordinal level data regarding reviewer agreement. Also, reviewer recommendations seem to be a poor proxy for citations because, for example, manuscripts will be changed considerably during the revision process.