BMC Bioinformatics (Nov 2004)

Noise filtering and nonparametric analysis of microarray data underscores discriminating markers of oral, prostate, lung, ovarian and breast cancer

  • Dermody James J,
  • Cheng Jeff,
  • Cody Michael J,
  • Aris Virginie M,
  • Soteropoulos Patricia,
  • Recce Michael,
  • Tolias Peter P

DOI
https://doi.org/10.1186/1471-2105-5-185
Journal volume & issue
Vol. 5, no. 1
p. 185

Abstract

Read online

Abstract Background A major goal of cancer research is to identify discrete biomarkers that specifically characterize a given malignancy. These markers are useful in diagnosis, may identify potential targets for drug development, and can aid in evaluating treatment efficacy and predicting patient outcome. Microarray technology has enabled marker discovery from human cells by permitting measurement of steady-state mRNA levels derived from thousands of genes. However many challenging and unresolved issues regarding the acquisition and analysis of microarray data remain, such as accounting for both experimental and biological noise, transcripts whose expression profiles are not normally distributed, guidelines for statistical assessment of false positive/negative rates and comparing data derived from different research groups. This study addresses these issues using Affymetrix HG-U95A and HG-U133 GeneChip data derived from different research groups. Results We present here a simple non parametric approach coupled with noise filtering to identify sets of genes differentially expressed between the normal and cancer states in oral, breast, lung, prostate and ovarian tumors. An important feature of this study is the ability to integrate data from different laboratories, improving the analytical power of the individual results. One of the most interesting findings is the down regulation of genes involved in tissue differentiation. Conclusions This study presents the development and application of a noise model that suppresses noise, limits false positives in the results, and allows integration of results from individual studies derived from different research groups.