GAN-Based Training of Semi-Interpretable Generators for Biological Data Interpolation and Augmentation

Anastasios Tsourtis; Georgios Papoutsoglou; Yannis Pantazis

doi:10.3390/app12115434

Applied Sciences (May 2022)

GAN-Based Training of Semi-Interpretable Generators for Biological Data Interpolation and Augmentation

Anastasios Tsourtis,
Georgios Papoutsoglou,
Yannis Pantazis

Affiliations

Anastasios Tsourtis: Institute of Applied and Computational Mathematics, Foundation of Research and Technology Hellas, GR 700 13 Heraklion, Greece
Georgios Papoutsoglou: Institute of Applied and Computational Mathematics, Foundation of Research and Technology Hellas, GR 700 13 Heraklion, Greece
Yannis Pantazis: Institute of Applied and Computational Mathematics, Foundation of Research and Technology Hellas, GR 700 13 Heraklion, Greece

DOI: https://doi.org/10.3390/app12115434
Journal volume & issue: Vol. 12, no. 11
p. 5434

Abstract

Read online

Single-cell measurements incorporate invaluable information regarding the state of each cell and its underlying regulatory mechanisms. The popularity and use of single-cell measurements are constantly growing. Despite the typically large number of collected data, the under-representation of important cell (sub-)populations negatively affects down-stream analysis and its robustness. Therefore, the enrichment of biological datasets with samples that belong to a rare state or manifold is overall advantageous. In this work, we train families of generative models via the minimization of Rényi divergence resulting in an adversarial training framework. Apart from the standard neural network-based models, we propose families of semi-interpretable generative models. The proposed models are further tailored to generate realistic gene expression measurements, whose characteristics include zero-inflation and sparsity, without the need of any data pre-processing. Explicit factors of the data such as measurement time, state or cluster are taken into account by our generative models as conditional variables. We train the proposed conditional models and compare them against the state-of-the-art on a range of synthetic and real datasets and demonstrate their ability to accurately perform data interpolation and augmentation.

Published in Applied Sciences

ISSN: 2076-3417 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Engineering (General). Civil engineering (General); Science: Biology (General); Science: Physics; Science: Chemistry
Website: http://www.mdpi.com/journal/applsci

About the journal

Abstract

Keywords