Conformal efficiency as a metric for comparative model assessment befitting federated learning

Wouter Heyndrickx; Adam Arany; Jaak Simm; Anastasia Pentina; Noé Sturm; Lina Humbeck; Lewis Mervin; Adam Zalewski; Martijn Oldenhof; Peter Schmidtke; Lukas Friedrich; Regis Loeb; Arina Afanasyeva; Ansgar Schuffenhauer; Yves Moreau; Hugo Ceulemans

Artificial Intelligence in the Life Sciences (Dec 2023)

Conformal efficiency as a metric for comparative model assessment befitting federated learning

Wouter Heyndrickx,
Adam Arany,
Jaak Simm,
Anastasia Pentina,
Noé Sturm,
Lina Humbeck,
Lewis Mervin,
Adam Zalewski,
Martijn Oldenhof,
Peter Schmidtke,
Lukas Friedrich,
Regis Loeb,
Arina Afanasyeva,
Ansgar Schuffenhauer,
Yves Moreau,
Hugo Ceulemans

Affiliations

Wouter Heyndrickx: Janssen Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium; Corresponding author.
Adam Arany: KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Jaak Simm: KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Anastasia Pentina: Machine Learning Research, Research & Development, Pharmaceuticals, Bayer AG, Berlin 10117, Federal Republic of Germany
Noé Sturm: Novartis Institutes for BioMedical Research, Novartis Campus, Basel CH-4002, Switzerland
Lina Humbeck: Medicinal Chemistry Department, Boehringer Ingelheim Pharma GmbH & Co. KG, Birkendorfer Str. 65, Biberach an der Riss 88397, Federal Republic of Germany
Lewis Mervin: Molecular AI, Discovery Sciences, R&D, AstraZeneca, Cambridge, UK
Adam Zalewski: Amgen Research (Munich) GmbH, Staffelseestraße 2, Munich 81477, Federal Republic of Germany
Martijn Oldenhof: KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Peter Schmidtke: Discngine, 79 Avenue Ledru Rollin, Paris 75012, France
Lukas Friedrich: Global Research & Development, Merck KGaA, Frankfurter Strasse 250, Darmstadt 64293, Federal Republic of Germany
Regis Loeb: KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Arina Afanasyeva: Modality Informatics Group, Digital Research Solutions, Advanced Informatics & Analytics, Astellas Pharma Inc., 21, Miyukigaoka, Tsukuba-shi, Ibaraki 305-8585, Japan
Ansgar Schuffenhauer: Novartis Institutes for BioMedical Research, Novartis Campus, Basel CH-4002, Switzerland
Yves Moreau: KU Leuven, ESAT-STADIUS, Kasteelpark Arenberg 10, Heverlee 3001, Belgium
Hugo Ceulemans: Janssen Pharmaceutica NV, Turnhoutseweg 30, Beerse 2340, Belgium

Journal volume & issue: Vol. 3
p. 100070

Abstract

Read online

In a drug discovery setting, pharmaceutical companies own substantial but confidential datasets. The MELLODDY project developed a privacy-preserving federated machine learning solution and deployed it at an unprecedented scale. Each partner built models for their own private assays that benefitted from a shared representation. Established predictive performance metrics such as AUC ROC or AUC PR are constrained to unseen labeled chemical space and cannot gage performance gains in unlabeled chemical space. Federated learning indirectly extends labeled space, but in a privacy-preserving context, a partner cannot use this label extension for performance assessment. Metrics that estimate uncertainty on a prediction can be calculated even where no label is known. Practically, the chemical space covered with predictions above an uncertainty threshold, reflects the applicability domain of a model. After establishing a link to established performance metrics, we propose the efficiency from the conformal prediction framework (‘conformal efficiency’) as a proxy to the applicability domain size. A documented extension of the applicability domain would qualify as a tangible benefit from federated learning. In interim assessments, MELLODDY partners reported a median increase in conformal efficiency of the federated over the single-partner model of 5.5% (with increases up to 9.7%). Subject to distributional conditions, that efficiency increase can be directly interpreted as the expected increase in conformal i.e. low uncertainty predictions. In conclusion, we present the first indication that privacy-preserving federated machine learning across massive drug-discovery datasets from ten pharma partners indeed extends the applicability domain of property prediction models.

Published in Artificial Intelligence in the Life Sciences

ISSN: 2667-3185 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Science: Science (General)
Website: https://www.journals.elsevier.com/artificial-intelligence-in-the-life-sciences

About the journal

Abstract

Keywords