Exploring Dimensionality Reduction Techniques for Deep Learning Driven QSAR Models of Mutagenicity

Alexander D. Kalian; Emilio Benfenati; Olivia J. Osborne; David Gott; Claire Potter; Jean-Lou C. M. Dorne; Miao Guo; Christer Hogstrand

doi:10.3390/toxics11070572

Toxics (Jun 2023)

Exploring Dimensionality Reduction Techniques for Deep Learning Driven QSAR Models of Mutagenicity

Alexander D. Kalian,
Emilio Benfenati,
Olivia J. Osborne,
David Gott,
Claire Potter,
Jean-Lou C. M. Dorne,
Miao Guo,
Christer Hogstrand

Affiliations

Alexander D. Kalian: Department of Nutritional Sciences, King’s College London, Franklin-Wilkins Building, 150 Stamford St., London SE1 9NH, UK
Emilio Benfenati: Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Via Mario Negri 2, 20156 Milano, Italy
Olivia J. Osborne: Food Standards Agency, 70 Petty France, London SW1H 9EX, UK
David Gott: Food Standards Agency, 70 Petty France, London SW1H 9EX, UK
Claire Potter: Food Standards Agency, 70 Petty France, London SW1H 9EX, UK
Jean-Lou C. M. Dorne: European Food Safety Authority (EFSA), Via Carlo Magno 1A, 43126 Parma, Italy
Miao Guo: Department of Engineering, King’s College London, Strand Campus, Strand, London WC2R 2LS, UK
Christer Hogstrand: Department of Analytical, Environmental and Forensic Sciences, King’s College London, Franklin-Wilkins Building, 150 Stamford St., London SE1 9NH, UK

DOI: https://doi.org/10.3390/toxics11070572
Journal volume & issue: Vol. 11, no. 7
p. 572

Abstract

Read online

Dimensionality reduction techniques are crucial for enabling deep learning driven quantitative structure-activity relationship (QSAR) models to navigate higher dimensional toxicological spaces, however the use of specific techniques is often arbitrary and poorly explored. Six dimensionality techniques (both linear and non-linear) were hence applied to a higher dimensionality mutagenicity dataset and compared in their ability to power a simple deep learning driven QSAR model, following grid searches for optimal hyperparameter values. It was found that comparatively simpler linear techniques, such as principal component analysis (PCA), were sufficient for enabling optimal QSAR model performances, which indicated that the original dataset was at least approximately linearly separable (in accordance with Cover’s theorem). However certain non-linear techniques such as kernel PCA and autoencoders performed at closely comparable levels, while (especially in the case of autoencoders) being more widely applicable to potentially non-linearly separable datasets. Analysis of the chemical space, in terms of XLogP and molecular weight, uncovered that the vast majority of testing data occurred within the defined applicability domain, as well as that certain regions were measurably more problematic and antagonised performances. It was however indicated that certain dimensionality reduction techniques were able to facilitate uniquely beneficial navigations of the chemical space.

Published in Toxics

ISSN: 2305-6304 (Online)
Publisher: MDPI AG
Country of publisher: Switzerland
LCC subjects: Technology: Chemical technology
Website: http://www.mdpi.com/journal/toxics/

About the journal

Abstract

Keywords