Systematic Reviews (Feb 2020)

Inter-rater reliability and validity of risk of bias instrument for non-randomized studies of exposures: a study protocol

  • Maya M. Jeyaraman,
  • Nameer Al-Yousif,
  • Reid C. Robson,
  • Leslie Copstein,
  • Chakrapani Balijepalli,
  • Kimberly Hofer,
  • Mir S. Fazeli,
  • Mohammed T. Ansari,
  • Andrea C. Tricco,
  • Rasheda Rabbani,
  • Ahmed M. Abou-Setta

DOI
https://doi.org/10.1186/s13643-020-01291-z
Journal volume & issue
Vol. 9, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background A new tool, “risk of bias (ROB) instrument for non-randomized studies of exposures (ROB-NRSE),” was recently developed. It is important to establish consistency in its application and interpretation across review teams. In addition, it is important to understand if specialized training and guidance will improve the reliability in the results of the assessments. Therefore, the objective of this cross-sectional study is to establish the inter-rater reliability (IRR), inter-consensus reliability (ICR), and concurrent validity of the new ROB-NRSE tool. Furthermore, as this is a relatively new tool, it is important to understand the barriers to using this tool (e.g., time to conduct assessments and reach consensus—evaluator burden). Methods Reviewers from four participating centers will apprise the ROB of a sample of NRSE publications using ROB-NRSE tool in two stages. For IRR and ICR, two pairs of reviewers will assess the ROB for each NRSE publication. In the first stage, reviewers will assess the ROB without any formal guidance. In the second stage, reviewers will be provided customized training and guidance. At each stage, each pair of reviewers will resolve conflicts and arrive at a consensus. To calculate the IRR and ICR, we will use Gwet’s AC1 statistic. For concurrent validity, reviewers will appraise a sample of NRSE publications using both the Newcastle-Ottawa Scale (NOS) and ROB-NRSE tool. We will analyze the concordance between the two tools for similar domains and for the overall judgments using Kendall’s tau coefficient. To measure evaluator burden, we will assess the time taken to apply ROB-NRSE tool (without and with guidance), and the NOS. To assess the impact of customized training and guidance on the evaluator burden, we will use the generalized linear models. We will use Microsoft Excel and SAS 9.4, to manage and analyze study data, respectively. Discussion The quality of evidence from systematic reviews that include NRSE depends partly on the study-level ROB assessments. The findings of this study will contribute to an improved understanding of ROB-NRSE and how best to use it.

Keywords