Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods

Claire Ramus; Agnès Hovasse; Marlène Marcellin; Anne-Marie Hesse; Emmanuelle Mouton-Barbosa; David Bouyssié; Sebastian Vaca; Christine Carapito; Karima Chaoui; Christophe Bruley; Jérôme Garin; Sarah Cianférani; Myriam Ferro; Alain Van Dorssaeler; Odile Burlet-Schiltz; Christine Schaeffer; Yohann Couté; Anne Gonzalez de Peredo

Data in Brief (Mar 2016)

Spiked proteomic standard dataset for testing label-free quantitative software and statistical methods

Claire Ramus,
Agnès Hovasse,
Marlène Marcellin,
Anne-Marie Hesse,
Emmanuelle Mouton-Barbosa,
David Bouyssié,
Sebastian Vaca,
Christine Carapito,
Karima Chaoui,
Christophe Bruley,
Jérôme Garin,
Sarah Cianférani,
Myriam Ferro,
Alain Van Dorssaeler,
Odile Burlet-Schiltz,
Christine Schaeffer,
Yohann Couté,
Anne Gonzalez de Peredo

Affiliations

Claire Ramus: ProFi, Proteomic French Infrastructure, France; CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, Grenoble F-38054, France; INSERM U1038, Grenoble F-38054, France; Université Grenoble, F-38054, France
Agnès Hovasse: ProFi, Proteomic French Infrastructure, France; Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178, 25 Rue Becquerel, 67087 Strasbourg, France
Marlène Marcellin: ProFi, Proteomic French Infrastructure, France; CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 205 Route de Narbonne, 31077 Toulouse, France; Université de Toulouse, 118 Route de Narbonne, 31077 Toulouse, France
Anne-Marie Hesse: ProFi, Proteomic French Infrastructure, France; CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, Grenoble F-38054, France; INSERM U1038, Grenoble F-38054, France; Université Grenoble, F-38054, France
Emmanuelle Mouton-Barbosa: ProFi, Proteomic French Infrastructure, France; CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 205 Route de Narbonne, 31077 Toulouse, France; Université de Toulouse, 118 Route de Narbonne, 31077 Toulouse, France
David Bouyssié: ProFi, Proteomic French Infrastructure, France; CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 205 Route de Narbonne, 31077 Toulouse, France; Université de Toulouse, 118 Route de Narbonne, 31077 Toulouse, France
Sebastian Vaca: ProFi, Proteomic French Infrastructure, France; Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178, 25 Rue Becquerel, 67087 Strasbourg, France
Christine Carapito: ProFi, Proteomic French Infrastructure, France; Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178, 25 Rue Becquerel, 67087 Strasbourg, France
Karima Chaoui: ProFi, Proteomic French Infrastructure, France; CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 205 Route de Narbonne, 31077 Toulouse, France; Université de Toulouse, 118 Route de Narbonne, 31077 Toulouse, France
Christophe Bruley: ProFi, Proteomic French Infrastructure, France; CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, Grenoble F-38054, France; INSERM U1038, Grenoble F-38054, France; Université Grenoble, F-38054, France
Jérôme Garin: ProFi, Proteomic French Infrastructure, France; CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, Grenoble F-38054, France; INSERM U1038, Grenoble F-38054, France; Université Grenoble, F-38054, France
Sarah Cianférani: ProFi, Proteomic French Infrastructure, France; Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178, 25 Rue Becquerel, 67087 Strasbourg, France
Myriam Ferro: ProFi, Proteomic French Infrastructure, France; CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, Grenoble F-38054, France; INSERM U1038, Grenoble F-38054, France; Université Grenoble, F-38054, France
Alain Van Dorssaeler: ProFi, Proteomic French Infrastructure, France; Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178, 25 Rue Becquerel, 67087 Strasbourg, France
Odile Burlet-Schiltz: ProFi, Proteomic French Infrastructure, France; CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 205 Route de Narbonne, 31077 Toulouse, France; Université de Toulouse, 118 Route de Narbonne, 31077 Toulouse, France
Christine Schaeffer: ProFi, Proteomic French Infrastructure, France; Laboratoire de Spectrométrie de Masse BioOrganique (LSMBO), IPHC, Université de Strasbourg, CNRS, UMR7178, 25 Rue Becquerel, 67087 Strasbourg, France
Yohann Couté: ProFi, Proteomic French Infrastructure, France; CEA, DSV, iRTSV, Laboratoire de Biologie à Grande Echelle, Grenoble F-38054, France; INSERM U1038, Grenoble F-38054, France; Université Grenoble, F-38054, France
Anne Gonzalez de Peredo: ProFi, Proteomic French Infrastructure, France; CNRS UMR5089 Institut de Pharmacologie et de Biologie Structurale, 205 Route de Narbonne, 31077 Toulouse, France; Université de Toulouse, 118 Route de Narbonne, 31077 Toulouse, France

Journal volume & issue: Vol. 6
pp. 286 – 294

Abstract

Read online

This data article describes a controlled, spiked proteomic dataset for which the “ground truth” of variant proteins is known. It is based on the LC-MS analysis of samples composed of a fixed background of yeast lysate and different spiked amounts of the UPS1 mixture of 48 recombinant proteins. It can be used to objectively evaluate bioinformatic pipelines for label-free quantitative analysis, and their ability to detect variant proteins with good sensitivity and low false discovery rate in large-scale proteomic studies. More specifically, it can be useful for tuning software tools parameters, but also testing new algorithms for label-free quantitative analysis, or for evaluation of downstream statistical methods. The raw MS files can be downloaded from ProteomeXchange with identifier http://www.ebi.ac.uk/pride/archive/projects/PXD001819. Starting from some raw files of this dataset, we also provide here some processed data obtained through various bioinformatics tools (including MaxQuant, Skyline, MFPaQ, IRMa-hEIDI and Scaffold) in different workflows, to exemplify the use of such data in the context of software benchmarking, as discussed in details in the accompanying manuscript [1]. The experimental design used here for data processing takes advantage of the different spike levels introduced in the samples composing the dataset, and processed data are merged in a single file to facilitate the evaluation and illustration of software tools results for the detection of variant proteins with different absolute expression levels and fold change values.

Published in Data in Brief

ISSN: 2352-3409 (Online)
Publisher: Elsevier
Country of publisher: United States
LCC subjects: Medicine: Medicine (General): Computer applications to medicine. Medical informatics; Science: Science (General)
Website: http://www.journals.elsevier.com/data-in-brief/

About the journal