BMC Bioinformatics (Oct 2008)
Recovering probabilities for nucleotide trimming processes for T cell receptor TRA and TRG V-J junctions analyzed with IMGT tools
Abstract
Abstract Background Nucleotides are trimmed from the ends of variable (V), diversity (D) and joining (J) genes during immunoglobulin (IG) and T cell receptor (TR) rearrangements in B cells and T cells of the immune system. This trimming is followed by addition of nucleotides at random, forming the N regions (N for nucleotides) of the V-J and V-D-J junctions. These processes are crucial for creating diversity in the immune response since the number of trimmed nucleotides and the number of added nucleotides vary in each B or T cell. IMGT® sequence analysis tools, IMGT/V-QUEST and IMGT/JunctionAnalysis, are able to provide detailed and accurate analysis of the final observed junction nucleotide sequences (tool "output"). However, as trimmed nucleotides can potentially be replaced by identical N region nucleotides during the process, the observed "output" represents a biased estimate of the "true trimming process." Results A probabilistic approach based on an analysis of the standardized tool "output" is proposed to infer the probability distribution of the "true trimmming process" and to provide plausible biological hypotheses explaining this process. We collated a benchmark dataset of TR alpha (TRA) and TR gamma (TRG) V-J rearranged sequences and junctions analysed with IMGT/V-QUEST and IMGT/JunctionAnalysis, the nucleotide sequence analysis tools from IMGT®, the international ImMunoGeneTics information system®, http://imgt.cines.fr. The standardized description of the tool output is based on the IMGT-ONTOLOGY axioms and concepts. We propose a simple first-order model that attempts to transform the observed "output" probability distribution into an estimate closer to the "true trimming process" probability distribution. We use this estimate to test the hypothesis that Poisson processes are involved in trimming. This hypothesis was not rejected at standard confidence levels for three of the four trimming processes: TRAV, TRAJ and TRGV. Conclusion By using trimming of rearranged TR genes as a benchmark, we show that a probabilistic approach, applied to IMGT® standardized tool "outputs" opens the way to plausible hypotheses on the events involved in the "true trimming process" and eventually to an exact quantification of trimming itself. With increasing high-throughput of standardized immunogenetics data, similar probabilistic approaches will improve understanding of processes so far only characterized by the "output" of standardized tools.