BMC Cancer (Apr 2025)

Evaluating key predictors of breast cancer through survival: a comparison of AFT frailty models with LASSO, ridge, and elastic net regularization

  • Senyefia Bosson-Amedenu,
  • Emmanuel Ayitey,
  • Francis Ayiah-Mensah,
  • Luyton Asare

DOI
https://doi.org/10.1186/s12885-025-14040-z
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 26

Abstract

Read online

Abstract Background Frailty models are extensively utilized in survival analysis to address unobserved heterogeneity among individuals. However, selecting the most robust model for survival prediction, especially in the context of high-dimensional data, continues to pose a challenge. This study evaluates the performance of various Accelerated Failure Time (AFT) frailty models and examines the influence of regularization techniques, including LASSO, Ridge, and Elastic Net, on model selection and prediction accuracy. Methods We utilized both simulated datasets and a real breast cancer dataset to compare the performance of seven Accelerated Failure Time (AFT) frailty models: Weibull, Log-logistic, Gamma, Gompertz, Log-normal, Generalized Gamma, and the Extreme Value Frailty AFT model. Model performance was evaluated using Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Mean Absolute Error (MAE), and Mean Squared Error (MSE) metrics across three sample sizes (25%, 50%, and 75%). To enhance parameter estimation and reduce overfitting in high-dimensional survival data, we applied regularization methods, including LASSO, Ridge, and Elastic Net. The Extreme Value Frailty AFT model consistently outperformed all other models across various sample sizes, demonstrating the lowest values for AIC, BIC, MAE, and MSE. These results indicate its superior fit and predictive accuracy. The forest plot analysis further validates the strong impact of significant covariates. The model's AIC ranged from 100.41 at a 25% sample size to 384.58 at a 75% sample size, consistently surpassing the performance of the second-best Log-logistic model. Furthermore, the application of LASSO regularization improved the model's parsimony by eliminating non-informative covariates, such as Age, PR, and Hospitalization, while retaining essential predictors like Competing Risks, Metastasis, Stage, and Lymph Node involvement. Conclusion The Extreme Value Frailty Accelerated Failure Time (AFT) model demonstrated strong predictive performance in survival analysis, particularly when combined with LASSO regularization to enhance interpretability and generalizability. Key predictors—including Comorbidity, Metastasis, Stage, and Lymph Node involvement—remained significant after regularization, with reduced coefficients. Notably, patients without metastasis had 2.63 times longer expected survival than those with metastatic disease, while lower-stage diagnoses and minimal lymph node involvement contributed to 26% and 16% longer survival times, respectively. Other significant factors included recurrence status (19% increase in survival), HER2 negativity (20% longer survival), absence of the Triple Negative subtype (15% longer survival), and lower tumor grades (11% longer survival).By effectively shrinking less relevant variables, LASSO mitigated overfitting while preserving critical predictors, reinforcing the importance of tumor characteristics and molecular markers in survival outcomes. The study highlights the crucial role of risk stratification, as patients categorized into Low, Medium, and High-risk groups exhibit distinct survival patterns, aligning with the Extreme Value AFT Frailty Model. The forest plot analysis further validates the strong impact of significant covariates, with Competing Risks, Lymph Node Involvement, and Metastasis emerging as the most critical prognostic factors. Kaplan–Meier survival analysis reveals sharp survival declines associated with metastasis, lymph node involvement, tumor grade, HER2 status, and molecular subtypes, reinforcing the urgent need for early detection and targeted interventions. Notably, patients with Triple Negative and HER2-overexpressing subtypes exhibit the poorest survival outcomes, highlighting the necessity for subtype-specific therapies. Additionally, competing risks, particularly hospitalization-related factors, substantially impact survival, emphasizing the need for integrated treatment approaches.These findings emphasize the role of advanced statistical techniques in improving survival predictions, providing valuable insights that can enhance clinical decision-making in breast cancer prognosis and broader medical research.

Keywords