Journal of Cheminformatics (Mar 2020)

COVER: conformational oversampling as data augmentation for molecules

  • Jennifer Hemmerich,
  • Ece Asilar,
  • Gerhard F. Ecker

DOI
https://doi.org/10.1186/s13321-020-00420-z
Journal volume & issue
Vol. 12, no. 1
pp. 1 – 12

Abstract

Read online

Abstract Training neural networks with small and imbalanced datasets often leads to overfitting and disregard of the minority class. For predictive toxicology, however, models with a good balance between sensitivity and specificity are needed. In this paper we introduce conformational oversampling as a means to balance and oversample datasets for prediction of toxicity. Conformational oversampling enhances a dataset by generation of multiple conformations of a molecule. These conformations can be used to balance, as well as oversample a dataset, thereby increasing the dataset size without the need of artificial samples. We show that conformational oversampling facilitates training of neural networks and provides state-of-the-art results on the Tox21 dataset.

Keywords