Scientific Reports (May 2023)
Unsatisfactory reproducibility of interstitial inflammation scoring in allograft kidney biopsy
Abstract
Abstract Interstitial inflammation scoring is incorporated into the Banff Classification of Renal Allograft Pathology and is essential for the diagnosis of T-cell mediated rejection. However, its reproducibility, including inter-rater and intra-rater reliabilities, has not been carefully investigated. In this study, eight renal pathologists from different hospitals independently scored 45 kidney allograft biopsies with varying extents of interstitial inflammation. Inter-rater reliabilities and intra-rater reliabilities were investigated by kappa statistics and conditional agreement probabilities. Individual pathologists’ scoring patterns were examined by chi-squared tests and proportions tests. The mean pairwise kappa values for inter-rater reliability were 0.27, 0.30, and 0.26 for the Banff i score, ti score, and i-IFTA, respectively. No rater pair performed consistently better or worse than others on all three scorings. After dichotomizing the scores into two groups (none/mild and moderate/severe inflammation), the averaged conditional agreements ranged from 47.1% to 50.0%. The distributions of the scores differed, but some pathologists persistently scored higher or lower than others. Given the important role of interstitial inflammation scoring in the diagnosis of T-cell mediated rejection, transplant practitioners should be aware of the possible clinical implications of the far-from-optimal reproducibility.