Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples

Maggie L Chow; Mary E Winn; Hai-Ri eLi; Craig eApril; Anthony eWynshaw-Boris; Jian-Bing eFan; Xiang-Dong eFu; Eric eCourchesne; Nicholas eSchork

doi:10.3389/fgene.2012.00011

Frontiers in Genetics (Feb 2012)

Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples

Maggie L Chow,
Mary E Winn,
Hai-Ri eLi,
Craig eApril,
Anthony eWynshaw-Boris,
Jian-Bing eFan,
Xiang-Dong eFu,
Eric eCourchesne,
Nicholas eSchork

Affiliations

Maggie L Chow: University of California San Diego
Mary E Winn: The Scripps Translational Science Institute
Hai-Ri eLi: University of California San Diego
Craig eApril: Illumina, Inc.
Anthony eWynshaw-Boris: University of California San Francisco
Jian-Bing eFan: Illumina, Inc.
Xiang-Dong eFu: University of California San Diego
Eric eCourchesne: University of California San Diego
Nicholas eSchork: The Scripps Translational Science Institute

DOI: https://doi.org/10.3389/fgene.2012.00011
Journal volume & issue: Vol. 3

Abstract

Read online

Available statistical preprocessing or quality control analysis tools for gene expression microarray datasets are known to greatly affect downstream data analysis, especially when degraded samples, unique tissue samples or novel expression assays are used. It is therefore important to assess the validity and impact of the assumptions built in to preprocessing schemes for a dataset. We developed and assessed a data preprocessing strategy for use with the Illumina DASL-based gene expression assay with partially degraded postmortem prefrontal cortex samples. The samples were obtained from individuals with autism as part of an investigation of the pathogenic factors contributing to autism.Using statistical analysis methods and metrics such as those associated with multivariate distance matrix regression (MDMR) and mean inter-array correlation, we developed a DASL-based assay gene expression preprocessing pipeline to accommodate and detect problems with microarray-based gene expression values obtained with degraded brain samples. Key steps in the pipeline included outlier exclusion, data transformation and normalization, and batch effect and covariate corrections. Our goal was to produce a clean dataset for subsequent downstream differential expression analysis. We ultimately settled on available transformation and normalization algorithms in the R/Bioconductor package lumi based on an assessment of their use in various combinations. A log2-transformed, quantile-normalized, and batch and seizure-corrected procedure was likely the most appropriate for our data. We empirically tested different components of our proposed preprocessing strategy and believe that our results suggest that a preprocessing strategy that effectively identifies outliers, normalizes the data, and corrects for batch effects, like ours can be applied to all studies, even those pursued with degraded samples.

Published in Frontiers in Genetics

ISSN: 1664-8021 (Online)
Publisher: Frontiers Media S.A.
Country of publisher: Switzerland
LCC subjects: Science: Biology (General): Genetics
Website: http://journal.frontiersin.org/journal/genetics

About the journal

Abstract

Keywords