Patterns (Jul 2021)

Generating hard-to-obtain information from easy-to-obtain information: Applications in drug discovery and clinical inference

  • Matthew Amodio,
  • Dennis Shung,
  • Daniel B. Burkhardt,
  • Patrick Wong,
  • Michael Simonov,
  • Yu Yamamoto,
  • David van Dijk,
  • Francis Perry Wilson,
  • Akiko Iwasaki,
  • Smita Krishnaswamy

Journal volume & issue
Vol. 2, no. 7
p. 100288

Abstract

Read online

Summary: Often when biological entities are measured in multiple ways, there are distinct categories of information: some information is easy-to-obtain information (EI) and can be gathered on virtually every subject of interest, while other information is hard-to-obtain information (HI) and can only be gathered on some. We propose building a model to make probabilistic predictions of HI using EI. Our feature mapping GAN (FMGAN), based on the conditional GAN framework, uses an embedding network to process conditions as part of the conditional GAN training to create manifold structure when it is not readily present in the conditions. We experiment on generating RNA sequencing of cell lines perturbed with a drug conditioned on the drug's chemical structure and generating FACS data from clinical monitoring variables on a cohort of COVID-19 patients, effectively describing their immune response in great detail. The bigger picture: Many experiments face a trade-off between gathering easy-to-collect information on many samples or hard-to-collect information on a smaller number of samples due to costs in terms of both money and time. We demonstrate that a mapping between the easy-to-collect and hard-to collect information can be trained as a conditional GAN from a subset of samples with both measured. With our conditional GAN model known as feature mapping GAN (FMGAN), the results of expensive experiments can be predicted, saving on the costs of actually performing the experiment. We study two example settings where this could have impact: pharmaceutical drug discovery, where early phase experiments require casting a wide net to find just a few potential leads to follow. FMGAN can also have a major impact in clinical setting, where standard measurements early in a stay can predict values of later single-cell-resolution samples.

Keywords