Reliability of trachoma clinical grading--assessing grading of marginal cases.

Salman A Rahman; Sun N Yu; Abdou Amza; Sintayehu Gebreselassie; Boubacar Kadri; Nassirou Baido; Nicole E Stoller; Joseph P Sheehan; Travis C Porco; Bruce D Gaynor; Jeremy D Keenan; Thomas M Lietman

doi:10.1371/journal.pntd.0002840

PLoS Neglected Tropical Diseases (May 2014)

Reliability of trachoma clinical grading--assessing grading of marginal cases.

Salman A Rahman,
Sun N Yu,
Abdou Amza,
Sintayehu Gebreselassie,
Boubacar Kadri,
Nassirou Baido,
Nicole E Stoller,
Joseph P Sheehan,
Travis C Porco,
Bruce D Gaynor,
Jeremy D Keenan,
Thomas M Lietman

Affiliations

Salman A Rahman
Sun N Yu
Abdou Amza
Sintayehu Gebreselassie
Boubacar Kadri
Nassirou Baido
Nicole E Stoller
Joseph P Sheehan
Travis C Porco
Bruce D Gaynor
Jeremy D Keenan
Thomas M Lietman

DOI: https://doi.org/10.1371/journal.pntd.0002840
Journal volume & issue: Vol. 8, no. 5
p. e2840

Abstract

Read online

BackgroundClinical examination of trachoma is used to justify intervention in trachoma-endemic regions. Currently, field graders are certified by determining their concordance with experienced graders using the kappa statistic. Unfortunately, trachoma grading can be highly variable and there are cases where even expert graders disagree (borderline/marginal cases). Prior work has shown that inclusion of borderline cases tends to reduce apparent agreement, as measured by kappa. Here, we confirm those results and assess performance of trainees on these borderline cases by calculating their reliability error, a measure derived from the decomposition of the Brier score.Methods and findingsWe trained 18 field graders using 200 conjunctival photographs from a community-randomized trial in Niger and assessed inter-grader agreement using kappa as well as reliability error. Three experienced graders scored each case for the presence or absence of trachomatous inflammation-follicular (TF) and trachomatous inflammation-intense (TI). A consensus grade for each case was defined as the one given by a majority of experienced graders. We classified cases into a unanimous subset if all 3 experienced graders gave the same grade. For both TF and TI grades, the mean kappa for trainees was higher on the unanimous subset; inclusion of borderline cases reduced apparent agreement by 15.7% for TF and 12.4% for TI. When we assessed the breakdown of the reliability error, we found that our trainees tended to over-call TF grades and under-call TI grades, especially in borderline cases.ConclusionsThe kappa statistic is widely used for certifying trachoma field graders. Exclusion of borderline cases, which even experienced graders disagree on, increases apparent agreement with the kappa statistic. Graders may agree less when exposed to the full spectrum of disease. Reliability error allows for the assessment of these borderline cases and can be used to refine an individual trainee's grading.

Published in PLoS Neglected Tropical Diseases

ISSN: 1935-2727 (Print); 1935-2735 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Medicine: Internal medicine: Special situations and conditions: Arctic medicine. Tropical medicine; Medicine: Public aspects of medicine
Website: https://journals.plos.org/plosntds/

About the journal