Computational and Structural Biotechnology Journal (Dec 2024)

Automated machine learning in nanotoxicity assessment: A comparative study of predictive model performance

  • Xiao Xiao,
  • Tung X. Trinh,
  • Zayakhuu Gerelkhuu,
  • Eunyong Ha,
  • Tae Hyun Yoon

Journal volume & issue
Vol. 25
pp. 9 – 19

Abstract

Read online

Computational modeling has earned significant interest as an alternative to animal testing of toxicity assessment. However, the process of selecting an appropriate algorithm and fine-tuning hyperparameters for the developing of optimized models takes considerable time, expertise, and an intensive search. The recent emergence of automated machine learning (autoML) approaches, available as user-friendly platforms, has proven beneficial for individuals with limited knowledge in ML-based predictive model development. These autoML platforms automate crucial steps in model development, including data preprocessing, algorithm selection, and hyperparameter tuning. In this study, we used seven previously published and publicly available datasets for oxides and metals to develop nanotoxicity prediction models. AutoML platforms, namely Vertex AI, Azure, and Dataiku, were employed and performance measures such as accuracy, F1 score, precision, and recall for these autoML-based models were then compared with those of conventional ML-based models. The results demonstrated clearly that the autoML platforms produced more reliable nanotoxicity prediction models, outperforming those built with conventional ML algorithms. While none of the three autoML platforms significantly outperformed the others, distinctions exist among them in terms of the available options for choosing technical features throughout the model development steps. This allows users to select an autoML platform that aligns with their knowledge of predictive model development and its technical features. Additionally, prediction models constructed from datasets with better data quality displayed, enhanced performance than those built from datasets with lower data quality, indicating that future studies with high-quality datasets can further improve the performance of those autoML-based prediction models.

Keywords