Delineation of amplification, hybridization and location effects in microarray data yields better-quality normalization

van Someren Eugene P; Mentink Anouk; Hulsman Marc; Dechering Koen J; de Boer Jan; Reinders Marcel JT

doi:10.1186/1471-2105-11-156

BMC Bioinformatics (Mar 2010)

Delineation of amplification, hybridization and location effects in microarray data yields better-quality normalization

van Someren Eugene P,
Mentink Anouk,
Hulsman Marc,
Dechering Koen J,
de Boer Jan,
Reinders Marcel JT

Affiliations

van Someren Eugene P
Mentink Anouk
Hulsman Marc
Dechering Koen J
de Boer Jan
Reinders Marcel JT

DOI: https://doi.org/10.1186/1471-2105-11-156
Journal volume & issue: Vol. 11, no. 1
p. 156

Abstract

Read online

Abstract Background Oligonucleotide arrays have become one of the most widely used high-throughput tools in biology. Due to their sensitivity to experimental conditions, normalization is a crucial step when comparing measurements from these arrays. Normalization is, however, far from a solved problem. Frequently, we encounter datasets with significant technical effects that currently available methods are not able to correct. Results We show that by a careful decomposition of probe specific amplification, hybridization and array location effects, a normalization can be performed that allows for a much improved analysis of these data. Identification of the technical sources of variation between arrays has allowed us to build statistical models that are used to estimate how the signal of individual probes is affected, based on their properties. This enables a model-based normalization that is probe-specific, in contrast with the signal intensity distribution normalization performed by many current methods. Next to this, we propose a novel way of handling background correction, enabling the use of background information to weight probes during summarization. Testing of the proposed method shows a much improved detection of differentially expressed genes over earlier proposed methods, even when tested on (experimentally tightly controlled and replicated) spike-in datasets. Conclusions When a limited number of arrays are available, or when arrays are run in different batches, technical effects have a large influence on the measured expression of genes. We show that a detailed modelling and correction of these technical effects allows for an improved analysis in these situations.

Published in BMC Bioinformatics

ISSN: 1471-2105 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Biology (General)
Website: http://www.biomedcentral.com/bmcbioinformatics/

About the journal