Evaluation of two methods for computational HLA haplotypes inference using a real dataset

Peixoto Maria J; Couto Ana R; Fialho Raquel N; Santos Margarida R; Bettencourt Bruno F; Pinheiro João P; Spínola Hélder; Mora Marian G; Santos Cristina; Brehm António; Bruges-Armas Jácome

doi:10.1186/1471-2105-9-68

BMC Bioinformatics (Jan 2008)

Evaluation of two methods for computational HLA haplotypes inference using a real dataset

Peixoto Maria J,
Couto Ana R,
Fialho Raquel N,
Santos Margarida R,
Bettencourt Bruno F,
Pinheiro João P,
Spínola Hélder,
Mora Marian G,
Santos Cristina,
Brehm António,
Bruges-Armas Jácome

Affiliations

Peixoto Maria J
Couto Ana R
Fialho Raquel N
Santos Margarida R
Bettencourt Bruno F
Pinheiro João P
Spínola Hélder
Mora Marian G
Santos Cristina
Brehm António
Bruges-Armas Jácome

DOI: https://doi.org/10.1186/1471-2105-9-68
Journal volume & issue: Vol. 9, no. 1
p. 68

Abstract

Read online

Abstract Background HLA haplotype analysis has been used in population genetics and in the investigation of disease-susceptibility locus, due to its high polymorphism. Several methods for inferring haplotype genotypic data have been proposed, but it is unclear how accurate each of the methods is or which method is superior. The accuracy of two of the leading methods of computational haplotype inference – Expectation-Maximization algorithm based (implemented in Arlequin V3.0) and Bayesian algorithm based (implemented in PHASE V2.1.1) – was compared using a set of 122 HLA haplotypes (A-B-Cw-DQB1-DRB1) determined through direct counting. The accuracy was measured with the Mean Squared Error (MSE), Similarity Index (IF) and Haplotype Identification Index (IH). Results None of the methods inferred all of the known haplotypes and some differences were observed in the accuracy of the two methods in terms of both haplotype determination and haplotype frequencies estimation. Working with haplotypes composed by low polymorphic sites, present in more than one individual, increased the confidence in the assignment of haplotypes and in the estimation of the haplotype frequencies generated by both programs. Conclusion The PHASE v2.1.1 implemented method had the best overall performance both in haplotype construction and frequency calculation, although the differences between the two methods were insubstantial. To our knowledge this was the first work aiming to test statistical methods using real haplotypic data from the HLA region.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal