BMC Biology (May 2025)

A comprehensive evaluation of diversity measures for TCR repertoire profiling

  • Justyna Mika,
  • Alicja Polanska,
  • Kim RM Blenman,
  • Lajos Pusztai,
  • Joanna Polanska,
  • Serge Candéias,
  • Michal Marczyk

DOI
https://doi.org/10.1186/s12915-025-02236-5
Journal volume & issue
Vol. 23, no. 1
pp. 1 – 18

Abstract

Read online

Abstract Background T cells play a crucial role in adaptive immunity, as they monitor internal and external immunogenic signals through their specific receptors (TCRs). Using high-throughput sequencing, one can assess TCR repertoire in various clinical settings and describe it quantitatively by calculating a diversity index. Multiple diversity indices that capture the richness of TCRs and the evenness of their distribution have been proposed in the literature; however, there is no consensus on gold-standard measures and interpretation of each index is complex. Our goal was to examine the performance characteristics of 12 commonly used diversity indices in simulated and real-world data. Results Simulated data were generated to evaluate how data richness and evenness affect index values using three nonparametric models. Fourteen real-world TCR datasets were obtained to examine differences in indices by analysis protocols and test their robustness to subsampling. Pielou, Basharin, d50, and Gini primarily describe evenness and highly correlate with one another. They are best suited for measuring the representation of TCR clones. Richness is best captured by S index, next Chao1 and ACE which also consider information on evenness. Shannon, Inv.Simspon, D3, D4, and Gini.Simpson measure richness and increasingly more information on evenness. More skewed TCR distributions provided more stable results in subsampling. Gini-Simpson, Pielou, and Basharin were the most robust in both simulated and experimental data. Conclusions Our results could guide investigators to select the best diversity index for their particular experimental question and draw attention to factors that can influence the accuracy and reproducibility of results.

Keywords