Frontiers in Genetics (Mar 2013)
Maximal information component analysis: a novel non-linear network analysis method
Abstract
Background: Network construction and analysis algorithms provide scientists with the ability to sift through high-throughput biological outputs, such as transcription microarrays, for small groups of genes (modules) that are relevant for further research. Most of these algorithms ignore the important role of nonlinear interactions in the data, and the ability for genes to operate in multiple functional groups at once, despite clear evidence for both of these phenomena in observed biological systems. Results: We have created a novel co-expression network analysis algorithm that incorporates both of these principles by combining the information-theoretic association measure of the Maximal Information Coefficient with an Interaction Component Model. We evaluate the performance of this approach on two datasets collected from a large panel of mice, one from macrophages and the other from liver by comparing the two measures based on a measure of module entropy, GO enrichment and scale free topology fit. Our algorithm outperforms a widely used co-expression analysis method, Weighted Gene Coexpression Network Analysis (WGCNA), in the macrophage data, while returning comparable results in the liver dataset when using these criteria. We demonstrate that the macrophage data has more nonlinear interactions than the liver dataset, which may explain the increased performance of our method, termed Maximal Information Component Analysis (MICA) in that case.Conclusions: In making our network algorithm more accurately reflect known biological principles, we are able to generate modules with improved relevance, particularly in networks with confounding factors such as gene by environment interactions.
Keywords