BMC Immunology (Feb 2024)

Systematic evaluation of B-cell clonal family inference approaches

  • Daria Balashova,
  • Barbera D. C. van Schaik,
  • Maria Stratigopoulou,
  • Jeroen E. J. Guikema,
  • Tom G. Caniels,
  • Mathieu Claireaux,
  • Marit J. van Gils,
  • Anne Musters,
  • Dornatien C. Anang,
  • Niek de Vries,
  • Victor Greiff,
  • Antoine H. C. van Kampen

DOI
https://doi.org/10.1186/s12865-024-00600-8
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 22

Abstract

Read online

Abstract The reconstruction of clonal families (CFs) in B-cell receptor (BCR) repertoire analysis is a crucial step to understand the adaptive immune system and how it responds to antigens. The BCR repertoire of an individual is formed throughout life and is diverse due to several factors such as gene recombination and somatic hypermutation. The use of Adaptive Immune Receptor Repertoire sequencing (AIRR-seq) using next generation sequencing enabled the generation of full BCR repertoires that also include rare CFs. The reconstruction of CFs from AIRR-seq data is challenging and several approaches have been developed to solve this problem. Currently, most methods use the heavy chain (HC) only, as it is more variable than the light chain (LC). CF reconstruction options include the definition of appropriate sequence similarity measures, the use of shared mutations among sequences, and the possibility of reconstruction without preliminary clustering based on V- and J-gene annotation. In this study, we aimed to systematically evaluate different approaches for CF reconstruction and to determine their impact on various outcome measures such as the number of CFs derived, the size of the CFs, and the accuracy of the reconstruction. The methods were compared to each other and to a method that groups sequences based on identical junction sequences and another method that only determines subclones. We found that after accounting for data set variability, in particular sequencing depth and mutation load, the reconstruction approach has an impact on part of the outcome measures, including the number of CFs. Simulations indicate that unique junctions and subclones should not be used as substitutes for CF and that more complex methods do not outperform simpler methods. Also, we conclude that different approaches differ in their ability to correctly reconstruct CFs when not considering the LC and to identify shared CFs. The results showed the effect of different approaches on the reconstruction of CFs and highlighted the importance of choosing an appropriate method.

Keywords