Statistical approaches enabling technology-specific assay interference prediction from large screening data sets

Vincenzo Palmacci; Steffen Hirte; Jorge Enrique Hernández González; Floriane Montanari; Johannes Kirchmair

Artificial Intelligence in the Life Sciences (Jun 2024)

Statistical approaches enabling technology-specific assay interference prediction from large screening data sets

Vincenzo Palmacci,
Steffen Hirte,
Jorge Enrique Hernández González,
Floriane Montanari,
Johannes Kirchmair

Affiliations

Vincenzo Palmacci: Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; Vienna Doctoral School of Pharmaceutical, Nutritional and Sport Sciences (PhaNuSpo), University of Vienna, 1090 Vienna, Austria; Department of Machine Learning Research, Bayer AG, 13353 Berlin, Germany
Steffen Hirte: Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; Vienna Doctoral School of Pharmaceutical, Nutritional and Sport Sciences (PhaNuSpo), University of Vienna, 1090 Vienna, Austria
Jorge Enrique Hernández González: Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; Department of Physics, Sao Paulo State University, Rua Cristóvão Colombo 2265, São José do Rio Preto, CEP 15054-000, Brazil
Floriane Montanari: Department of Machine Learning Research, Bayer AG, 13353 Berlin, Germany
Johannes Kirchmair: Department of Pharmaceutical Sciences, Division of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna, 1090 Vienna, Austria; Christian Doppler Laboratory for Molecular Informatics in the Biosciences, Department for Pharmaceutical Sciences, University of Vienna, 1090 Vienna, Austria; Corresponding author.

Journal volume & issue: Vol. 5
p. 100099

Abstract

Read online

High throughput screening (HTS) technologies allow the biological testing of hundreds of thousands of compounds per day. Typically, a substantial proportion of the initial hits obtained by HTS are artifacts caused by assay interference. Therefore, global and technology-specific in silico models for identifying and predicting compounds interfering with biological assays have been developed. The global models benefit from training on large screening data sets, while the specialized models benefit from training on assay technology-specific experimental data. In this work, we develop and explore strategies for generating better predictors of technology-specific assay interference by utilizing the large bioactivity data matrices global models are trained on and employing partially new compound labeling approaches to maintain the assay technology awareness of specialized models. We demonstrate the utility of the statistically derived interference labels in machine learning using fluorescence-based assay interference as a representative example. Our random forest and multi-layer perceptron classifiers showed improved performance compared to existing models, achieving Matthews correlation coefficients (MCCs) of up to 0.47 on holdout data and up to 0.45 on an external test set. These results demonstrate that accurate assay-specific interference labels can be derived from large bioactivity data matrices, enabling the development of new machine-learning models without the need for further experimental data.

Published in Artificial Intelligence in the Life Sciences

ISSN: 2667-3185 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Science: Science (General)
Website: https://www.journals.elsevier.com/artificial-intelligence-in-the-life-sciences

About the journal

Abstract

Keywords