IEEE Access (Jan 2024)

Diving Into AutoML in Medical Imaging: Solution for Non-ML Practitioners

  • Ana Rodrigues,
  • Tiago Almeida,
  • Luis Bastiao Silva,
  • Carlos Costa

DOI
https://doi.org/10.1109/ACCESS.2024.3441469
Journal volume & issue
Vol. 12
pp. 151275 – 151302

Abstract

Read online

Within the healthcare sector, deploying Machine Learning (ML) models involves trial-and-error approaches, considerable time to create task-specific models, and collaboration between ML experts and physicians. To overcome these hurdles, Automated Machine Learning (AutoML) appears as a promising resource towards the advancement of precision and personalised medicine. Despite being a hot topic, AutoML remains relatively underexplored in the realm of medical imaging. To address this gap, the research objectives are to enrich the body of knowledge by comprehensively exploring the AutoML solutions, evaluating and comparing their performance, and even verifying their compliance to Explainable Artificial Intelligence (XAI) principles. Furthermore, we provide a unified solution with open-source AutoML solutions, as they seek to democratize access for less experienced users. Thus, this research paper contains a methodology to identify and evaluate 5 solutions, Fastai, Ktrain, Ludwig, Autogluon, and Autokeras, across 3 relevant medical imaging datasets. Their evaluation involved binary classifications using chest X-rays and breast histopathological images, and a multi-class scenario with brain Magnetic Resonance Imaging (MRI) images. Results reveal that both Ktrain and Fastai consistently demonstrate lower F1-Score values, regardless of the dataset. This stems from their limited level of automation, as the user is required to introduce the type of architecture and fine-tune certain hyperparameters. In contrast, Ludwig when employing the EfficientNet architecture, it roughly takes 10 minutes to achieve an F1-Score of $98.06{\pm }0.86$ % with the chest X-ray dataset. About 17 minutes to acquire $98.16{\pm }0.50$ % with the histopathological dataset, and nearly 7 minutes to reach an F1-Score of $96.52{\pm }1.07$ % with the brain MRI dataset. In this dataset is where Autokeras and Autogluon failed to provide suitable metrics for a typical multi-class problem. Nonetheless, in both binary scenarios, they achieve state-of-art results. Although having a higher execution time than Ludwig, they have a more robust search space. Finally, our findings highlight the need to incorporate XAI principles to improve the user’s experience, particularly for non-ML practitioners.

Keywords