Journal of Applied Mathematics (Jan 2025)
Comparison of Semiparametric Models in the Presence of Noise and Outliers
Abstract
Various studies have examined generalized additive models (GAMs), comparing thin plate splines (tp), P-splines (ps), cubic regression splines (cr), and Gaussian processes (gp) for discrete choice data, function approximation, and in the presence of multicollinearity and outliers. Some studies have applied ps to models with correlated and heteroscedastic errors, while others have reviewed multiple smoothing term packages for modeling GAMs. This study seeks to examine the performance of semiparametric models in the presence of different noise and outliers within the framework of GAMs through simulation. The study adopted four GAMs, cr, ps, tp, and gp, for simulated data with different noise and outliers with varying sample sizes. According to our investigation, the cr model performs well in terms of deviance for the majority of sample sizes and all types of noise. With higher sample sizes, the ps model frequently performs well, particularly in terms of AIC and GCV under noise that is heteroscedastic and Gaussian. The gp model excels with the smallest sample size under Gaussian and lognormal noise in terms of GCV, and the tp model frequently performs best under exponential and lognormal noise for larger samples in terms of AIC and GCV. For data containing outliers, the cr and tp models are effective with smaller sample sizes, while the gp model excels with larger sample sizes based on AIC and GCV. Regarding deviance, the cr model consistently performs best across all sample sizes. Our results show that the sample size and kind of noise in the data have a significant impact on the smoothing model’s performance. No single model consistently outperforms the others for all noise types and sample sizes, suggesting that the choice of model should be based on the specific goal of a study.