Significance evaluation in factor graphs

Tobias Madsen; Asger Hobolth; Jens Ledet Jensen; Jakob Skou Pedersen

doi:10.1186/s12859-017-1614-z

BMC Bioinformatics (Mar 2017)

Significance evaluation in factor graphs

Tobias Madsen,
Asger Hobolth,
Jens Ledet Jensen,
Jakob Skou Pedersen

Affiliations

Tobias Madsen: Department of Molecular Medicine, Aarhus University
Asger Hobolth: Bioinformatics Research Center, Aarhus University
Jens Ledet Jensen: Department of Mathematics, Aarhus University
Jakob Skou Pedersen: Department of Molecular Medicine, Aarhus University

DOI: https://doi.org/10.1186/s12859-017-1614-z
Journal volume & issue: Vol. 18, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Background Factor graphs provide a flexible and general framework for specifying probability distributions. They can capture a range of popular and recent models for analysis of both genomics data as well as data from other scientific fields. Owing to the ever larger data sets encountered in genomics and the multiple-testing issues accompanying them, accurate significance evaluation is of great importance. We here address the problem of evaluating statistical significance of observations from factor graph models. Results Two novel numerical approximations for evaluation of statistical significance are presented. First a method using importance sampling. Second a saddlepoint approximation based method. We develop algorithms to efficiently compute the approximations and compare them to naive sampling and the normal approximation. The individual merits of the methods are analysed both from a theoretical viewpoint and with simulations. A guideline for choosing between the normal approximation, saddle-point approximation and importance sampling is also provided. Finally, the applicability of the methods is demonstrated with examples from cancer genomics, motif-analysis and phylogenetics. Conclusions The applicability of saddlepoint approximation and importance sampling is demonstrated on known models in the factor graph framework. Using the two methods we can substantially improve computational cost without compromising accuracy. This contribution allows analyses of large datasets in the general factor graph framework.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal

Abstract

Keywords