reGenotyper: Detecting mislabeled samples in genetic data.

Konrad Zych; Basten L Snoek; Mark Elvin; Miriam Rodriguez; K Joeri Van der Velde; Danny Arends; Harm-Jan Westra; Morris A Swertz; Gino Poulin; Jan E Kammenga; Rainer Breitling; Ritsert C Jansen; Yang Li

doi:10.1371/journal.pone.0171324

PLoS ONE (Jan 2017)

reGenotyper: Detecting mislabeled samples in genetic data.

Konrad Zych,
Basten L Snoek,
Mark Elvin,
Miriam Rodriguez,
K Joeri Van der Velde,
Danny Arends,
Harm-Jan Westra,
Morris A Swertz,
Gino Poulin,
Jan E Kammenga,
Rainer Breitling,
Ritsert C Jansen,
Yang Li

Affiliations

Konrad Zych
Basten L Snoek
Mark Elvin
Miriam Rodriguez
K Joeri Van der Velde
Danny Arends
Harm-Jan Westra
Morris A Swertz
Gino Poulin
Jan E Kammenga
Rainer Breitling
Ritsert C Jansen
Yang Li

DOI: https://doi.org/10.1371/journal.pone.0171324
Journal volume & issue: Vol. 12, no. 2
p. e0171324

Abstract

Read online

In high-throughput molecular profiling studies, genotype labels can be wrongly assigned at various experimental steps; the resulting mislabeled samples seriously reduce the power to detect the genetic basis of phenotypic variation. We have developed an approach to detect potential mislabeling, recover the "ideal" genotype and identify "best-matched" labels for mislabeled samples. On average, we identified 4% of samples as mislabeled in eight published datasets, highlighting the necessity of applying a "data cleaning" step before standard data analysis.

Published in PLoS ONE

ISSN: 1932-6203 (Online)
Publisher: Public Library of Science (PLoS)
Country of publisher: United States
LCC subjects: Medicine; Science
Website: https://journals.plos.org/plosone/

About the journal