Fractionation statistics

Zheng Chunfang; Wang Baoyong; Sankoff David

doi:10.1186/1471-2105-12-S9-S5

BMC Bioinformatics (Oct 2011)

Fractionation statistics

Zheng Chunfang,
Wang Baoyong,
Sankoff David

Affiliations

Zheng Chunfang
Wang Baoyong
Sankoff David

DOI: https://doi.org/10.1186/1471-2105-12-S9-S5
Journal volume & issue: Vol. 12, no. Suppl 9
p. S5

Abstract

Read online

Abstract Background Paralog reduction, the loss of duplicate genes after whole genome duplication (WGD) is a pervasive process. Whether this loss proceeds gene by gene or through deletion of multi-gene DNA segments is controversial, as is the question of fractionation bias, namely whether one homeologous chromosome is more vulnerable to gene deletion than the other. Results As a null hypothesis, we first assume deletion events, on one homeolog only, excise a geometrically distributed number of genes with unknown mean µ, and these events combine to produce deleted runs of length l, distributed approximately as a negative binomial with unknown parameter r, itself a random variable with distribution π(·). A more realistic model requires deletion events on both homeologs distributed as a truncated geometric. We simulate the distribution of run lengths l in both models, as well as the underlying π(r), as a function of µ, and show how sampling l allows us to estimate µ. We apply this to data on a total of 15 genomes descended from 6 distinct WGD events and show how to correct the bias towards shorter runs caused by genome rearrangements. Because of the difficulty in deriving π(·) analytically, we develop a deterministic recurrence to calculate each π(r) as a function of µ and the proportion of unreduced paralog pairs. Conclusions The parameter µ can be estimated based on run lengths of single-copy regions. Estimates of µ in real data do not exclude the possibility that duplicate gene deletion is largely gene by gene, although it may sometimes involve longer segments.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal