Supervised chemical graph mining improves drug-induced liver injury prediction
Sangsoo Lim,
Youngkuk Kim,
Jeonghyeon Gu,
Sunho Lee,
Wonseok Shin,
Sun Kim
Affiliations
Sangsoo Lim
Bioinformatics Institute, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
Youngkuk Kim
Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
Jeonghyeon Gu
Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
Sunho Lee
AIGENDRUG Co., Ltd., Gwanak-ro 1, Seoul 08826, South Korea
Wonseok Shin
Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea
Sun Kim
Department of Computer Science and Engineering, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea; Interdisciplinary Program in Artificial Intelligence, Seoul National University, Gwanak-ro 1, Seoul 08826, South Korea; AIGENDRUG Co., Ltd., Gwanak-ro 1, Seoul 08826, South Korea; Corresponding author
Summary: Drug-induced liver injury (DILI) is the main cause of drug failure in clinical trials. The characterization of toxic compounds in terms of chemical structure is important because compounds can be metabolized to toxic substances in the liver. Traditional machine learning approaches have had limited success in predicting DILI, and emerging deep graph neural network (GNN) models are yet powerful enough to predict DILI. In this study, we developed a completely different approach, supervised subgraph mining (SSM), a strategy to mine explicit subgraph features by iteratively updating individual graph transitions to maximize DILI fidelity. Our method outperformed previous methods including state-of-the-art GNN tools in classifying DILI on two different datasets: DILIst and TDC-benchmark. We also combined the subgraph features by using SMARTS-based frequent structural pattern matching and associated them with drugs’ ATC code.