Journal of Integrative Bioinformatics (Jun 2006)
Identification of embryo specific human isoforms using a database of predicted alternative splice forms
Abstract
Alternative splicing is one of the most important mechanisms to generate a large number of mRNA and protein isoforms from a small number of genes. Its study became one of the hot topics in computational genome analysis. The repository EASED (Extended Alternatively Spliced EST Database, http://eased.bioinf.mdc-berlin.de/) stores a large collection of splice variants predicted from comparing the human genome against EST databases. It enables finding new unpublished splice forms that could be interesting candidate genes for stage specific, diseases specific or tissue specific splicing. The main idea behind selecting specific splice forms is to compare the number and the origin of ESTs confirming one isoform with the number and the origin of ESTs confirming the opposite isoform. A measure asDcs is introduced to take into account the unequal distribution of ESTs per splice site on one hand, and the possible uncertainties due to the relatively low quality of EST data on the other hand. First, the number of ESTs per splice site is scaled with a modified Hill function. The measure asDcs computes in the second step the distance of each pair of ESTs from equipartition. Equipartition exists if for every number of adult ESTs the same number of embryonic ESTs. The effect of several input parameters for the scaling of true EST values is analysed and can be reproduced on http://cardigan.zbh.uni-hamburg.de/asDcs. Some of the obtained best scoring hits for selected parameters (transcription factor P65, drebrin, and fetuin) have been already described in literature as been involved in embryonic development. This result shows the plausibility of this approach and looks promising for the identification of unplublished embryo specific alternative splice sites in human.