Multimodal Technologies and Interaction (May 2025)

Integrated Hyperparameter Optimization with Dimensionality Reduction and Clustering for Radiomics: A Bootstrapped Approach

  • S. J. Pawan,
  • Matthew Muellner,
  • Xiaomeng Lei,
  • Mihir Desai,
  • Bino Varghese,
  • Vinay Duddalwar,
  • Steven Y. Cen

DOI
https://doi.org/10.3390/mti9050049
Journal volume & issue
Vol. 9, no. 5
p. 49

Abstract

Read online

Radiomics involves extracting quantitative features from medical images, resulting in high-dimensional data. Unsupervised clustering has been used to discover patterns in radiomic features, potentially yielding hidden biological insights. However, its effectiveness depends on the selection of dimensionality reduction techniques, clustering methods, and hyperparameter optimization, an area with limited exploration in the literature. We present a novel bootstrapping-based hyperparameter search approach to optimize clustering efficacy, treating dimensionality reduction and clustering as a connected process chain. The hyperparameter search was guided by the Adjusted Rand Index (ARI) and Davies–Bouldin Index (DBI) within a bootstrapping framework of 100 iterations. The cluster assignments were generated through 10-fold cross-validation, and a grid search strategy was used to explore hyperparameter combinations. We evaluated ten unsupervised learning pipelines using both simulation studies and real-world radiomics data derived from multiphase CT images of renal cell carcinoma. In simulations, we found that Non-negative Matrix Factorization (NMF) and Spectral Clustering outperformed the traditional Principal Component Analysis (PCA)-based approach. The best-performing pipeline (NMF followed by K-means clustering) successfully identified all three simulated clusters, achieving a Cramér’s V of 0.9. The simulation also established a reference framework for understanding the concordance patterns among different pipelines under varying strengths of clustering effects. High concordance reflects strong clustering. In the real-world data application, we observed a moderate clustering effect, which aligned with the weak associations to clinical outcomes, as indicated by the highest AUROC of 0.63.

Keywords