Journal of Pathology Informatics (Jan 2021)

Selection of representative histologic slides in interobserver reproducibility studies: Insights from expert review for ovarian carcinoma subtype classification

  • Marios A Gavrielides,
  • Brigitte M Ronnett,
  • Russell Vang,
  • Fahime Sheikhzadeh,
  • Jeffrey D Seidman

DOI
https://doi.org/10.4103/jpi.jpi_56_20
Journal volume & issue
Vol. 12, no. 1
pp. 15 – 15

Abstract

Read online

Background: Observer studies in pathology often utilize a limited number of representative slides per case, selected and reported in a nonstandardized manner. Reference diagnoses are commonly assumed to be generalizable to all slides of a case. We examined these issues in the context of pathologist concordance for histologic subtype classification of ovarian carcinomas (OCs). Materials and Methods: A cohort of 114 OCs consisting of 72 cases with a single representative slide (Group 1) and 42 cases with multiple representative slides (148 slides, 2-“6 sections per case, Group 2) was independently reviewed by three experts in gynecologic pathology (case-based review). In a follow-up study, each individual slide was independently reviewed in a randomized order by the same pathologists (section-based review). Results: Average interobserver concordance varied from 100% for Group 1 to 64.3% for Group 2 (86.8% across all cases). Across Group 2, 19 cases (45.2%) had at least one slide classified as a different subtype than the subtype assigned from case-based review, demonstrating the impact of intratumoral heterogeneity. Section-based concordance across individual sections from Group 2 was comparable to case-based concordance for those cases indicating diagnostic challenges at the individual section level. Findings demonstrate the increased diagnostic complexity of heterogeneous tumors that require multiple section sampling and its impact on pathologist performance. Conclusions: The proportion of cases with multiple representative slides in cohorts used in validation studies, such as those conducted to evaluate artificial intelligence/machine learning tools, can influence diagnostic performance, and if not accounted for, can cause disparities between research and real-world observations and between research studies. Case selection in validation studies should account for tumor heterogeneity to create balanced datasets in terms of diagnostic complexity.

Keywords