Benchmarking machine learning methods for synthetic lethality prediction in cancer

Yimiao Feng; Yahui Long; He Wang; Yang Ouyang; Quan Li; Min Wu; Jie Zheng

doi:10.1038/s41467-024-52900-7

Nature Communications (Oct 2024)

Benchmarking machine learning methods for synthetic lethality prediction in cancer

Yimiao Feng,
Yahui Long,
He Wang,
Yang Ouyang,
Quan Li,
Min Wu,
Jie Zheng

Affiliations

Yimiao Feng: School of Information Science and Technology, ShanghaiTech University
Yahui Long: Bioformatics Institute (BII), Agency for Science, Technology and Research (A*STAR)
He Wang: School of Information Science and Technology, ShanghaiTech University
Yang Ouyang: School of Information Science and Technology, ShanghaiTech University
Quan Li: School of Information Science and Technology, ShanghaiTech University
Min Wu: Institute for Infocomm Research, Agency for Science, Technology and Research (A*STAR)
Jie Zheng: School of Information Science and Technology, ShanghaiTech University

DOI: https://doi.org/10.1038/s41467-024-52900-7
Journal volume & issue: Vol. 15, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Synthetic lethality (SL) is a gold mine of anticancer drug targets, exposing cancer-specific dependencies of cellular survival. To complement resource-intensive experimental screening, many machine learning methods for SL prediction have emerged recently. However, a comprehensive benchmarking is lacking. This study systematically benchmarks 12 recent machine learning methods for SL prediction, assessing their performance across diverse data splitting scenarios, negative sample ratios, and negative sampling techniques, on both classification and ranking tasks. We observe that all the methods can perform significantly better by improving data quality, e.g., excluding computationally derived SLs from training and sampling negative labels based on gene expression. Among the methods, SLMGAE performs the best. Furthermore, the methods have limitations in realistic scenarios such as cold-start independent tests and context-specific SLs. These results, together with source code and datasets made freely available, provide guidance for selecting suitable methods and developing more powerful techniques for SL virtual screening.

Published in Nature Communications

ISSN: 2041-1723 (Online)
Publisher: Nature Portfolio
Country of publisher: United Kingdom
LCC subjects: Science
Website: https://www.nature.com/ncomms/

About the journal