Selecting the most appropriate time points to profile in high-throughput studies
Michael Kleyman,
Emre Sefer,
Teodora Nicola,
Celia Espinoza,
Divya Chhabra,
James S Hagood,
Naftali Kaminski,
Namasivayam Ambalavanan,
Ziv Bar-Joseph
Affiliations
Michael Kleyman
Machine Learning and Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States
Emre Sefer
Machine Learning and Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, United States
Teodora Nicola
Division of Neonatology, Department of Pediatrics, University of Alabama at Birmingham, Birmingham, United States
Celia Espinoza
Division of Respiratory Medicine, Department of Pediatrics, University of California, San Diego, United States; CARady Children’s Hospital San Diego, San Diego, United States
Divya Chhabra
Division of Respiratory Medicine, Department of Pediatrics, University of California, San Diego, United States; CARady Children’s Hospital San Diego, San Diego, United States
James S Hagood
Division of Respiratory Medicine, Department of Pediatrics, University of California, San Diego, United States; CARady Children’s Hospital San Diego, San Diego, United States
Naftali Kaminski
Section of Pulmonary, Critical Care and Sleep Medicine, School of Medicine, Yale University, New Haven, United States
Namasivayam Ambalavanan
Division of Neonatology, Department of Pediatrics, University of Alabama at Birmingham, Birmingham, United States
Biological systems are increasingly being studied by high throughput profiling of molecular data over time. Determining the set of time points to sample in studies that profile several different types of molecular data is still challenging. Here we present the Time Point Selection (TPS) method that solves this combinatorial problem in a principled and practical way. TPS utilizes expression data from a small set of genes sampled at a high rate. As we show by applying TPS to study mouse lung development, the points selected by TPS can be used to reconstruct an accurate representation for the expression values of the non selected points. Further, even though the selection is only based on gene expression, these points are also appropriate for representing a much larger set of protein, miRNA and DNA methylation changes over time. TPS can thus serve as a key design strategy for high throughput time series experiments. Supporting Website: www.sb.cs.cmu.edu/TPS