Computational and Structural Biotechnology Journal (Dec 2024)

Rapid discovery of Transglutaminase 2 inhibitors for celiac disease with boosting ensemble machine learning

  • Ibrahim Wichka,
  • Pin-Kuang Lai

Journal volume & issue
Vol. 23
pp. 3669 – 3679

Abstract

Read online

Celiac disease poses a significant health challenge for individuals consuming gluten-containing foods. While the availability of gluten-free products has increased, there is still a need for therapeutic treatments. The advancement of computational drug design, particularly using bio-cheminformatics-oriented machine learning, offers promising avenues for developing such therapies. One promising target is Transglutaminase 2 (TG2), a protein involved in the autoimmune response triggered by gluten consumption. In this study, we utilized data from approximately 1100 TG2 inhibition assays to develop ligand-based molecular screening techniques using ensemble machine-learning models and extensive molecular feature libraries. Various classifiers, including tree-based methods, artificial neural networks, and graph neural networks, were evaluated to identify primary systems for predictive analysis and feature significance assessment. Boosting ensembles of perceptron deep learning and low-depth random forest weak learners emerged as the most effective, achieving over 90 % accuracy, significantly outperforming a baseline of 64 %. Key features, such as the presence of a terminal Michael acceptor group and a sulfonamide group, were identified as important for activity. Additionally, a regression model was created to rank active compounds. We developed a web application, Celiac Informatics (https://celiac-informatics-v1–2b0a85e75868.herokuapp.com), to facilitate the screening of potential therapeutic molecules for celiac disease. The web app also provides drug-likeness reports, supporting the development of novel drugs.

Keywords