Epidemiology, Biostatistics and Public Health (Sep 2025)

Predicting Methodological Quality in No Profit Clinical Trial

  • Giulia Gambini,
  • Eleonora Fresi,
  • Annalisa De Silvestri,
  • Valeria Musella,
  • Virginia Valeria Ferretti,
  • Catherine Klersy

DOI
https://doi.org/10.54103/2282-0930/29517

Abstract

Read online

INTRODUCTION The SPIRIT 2013 Statement [1] has long represented the international gold standard for the content of clinical trial protocols, providing a comprehensive framework to ensure transparency, methodological rigor, and ethical soundness. However, the rapid evolution of trial methodologies, regulatory landscapes, and data-sharing practices has created the need for a revised standard. Just days ago, SPIRIT 2025 [2] was released, introducing updated and expanded guidance that reflects contemporary challenges and expectations, particularly in areas such as adaptive designs, patient involvement, and statistical analysis. Despite these efforts, numerous studies continue to document suboptimal adherence to SPIRIT recommendations [3], especially concerning trial design and statistical methods. Evaluating real-world adherence to SPIRIT guidelines [4] offers valuable insights into common shortcomings, systemic barriers, and areas requiring targeted support or training [5]. OBJECTIVES The aim of this study was to assess whether it is possible to predict which clinical trial protocols are likely to show high adherence to SPIRIT guidelines, with a specific focus on methodological items. We sought to identify study-level characteristics that may act as potential predictors of adherence, to better understand structural drivers of protocol quality and support future improvement strategies. METHODS We retrieved information on design and methodological features of clinical trial protocols submitted between 2021 to 2025 to the local Ethics Committee and recorded them in a centralized REDCap registry. Adherence to SPIRIT 2013 items related to methodology, study design, data collection, management, and analysis (items 9–21b) was assessed. The 2013 version was used because all studies included were submitted prior to the release of Spirit 2025. Each item was scored as fulfilled or not, and an individual adherence score was computed as the total number of satisfied items. We described the distribution of adherence scores using median and interquartile range (IQR). Since adherence scores were not normally distributed, we dichotomized the scores at the median value, classifying them into higher adherence ("good") and lower adherence ("poor") categories. To address the predictive objective, we implemented a set of machine learning algorithms (e.g. Random Forest, eXtreme Gradient Boosting, and Boosted Logistic Regression) applied to a pool of candidate predictors selected for their potential relevance to the outcome. Model performance was evaluated using accuracy, area under the ROC curve (AUC), and F1 score. Variable importance was then assessed across models, and the most influential predictors were subsequently incorporated into a multivariable logistic regression model to evaluate their independent association with the outcome. Covariates included study characteristics related to sponsorship, methodological features, submission timing, and thematic focus. Odds ratios and corresponding 95% confidence intervals will be reported to evaluate the direction and strength of the association. Analyses were conducted using Stata software, release 19 and R version 4.4.3. RESULTS All 132 protocols included in the analysis were no profit interventional. Of these, 28% were monocentric, 59% multicentric within Italy, and 13% involved international sites. Overall, 32.6% of the studies were promoted by Italian sponsors with structured biostatistical support, 62.9% by other Italian sponsors for whom the availability of such support is unknown, and 4.5% by international institutions, also with unknown biostatistical support. Randomization was used in 59% of protocols. Blinding was reported in 12% of protocols: 9% were double-blinded and 3% single-blinded. Median adherence to the overall SPIRIT checklist was 72% (IQR 52.3%–85.7%). In the methods section, adherence was 80% (IQR 60%–90%) for items 9–15, 100% (IQR 66%–100%) for items 16–17, and 62.5% (IQR 37.5%–87.5%) for items 18–21b. The outcome was defined as a summary score representing the number of methodological items checked and was dichotomized at the median value. The presence of a biostatistics unit was significantly associated with higher methodological quality (OR = 3.44; 95% CI: 1.08–10.99; p = 0.037). Protocols involving pediatric populations were less likely to meet high-quality criteria (OR = 0.085; 95% CI: 0.007–1.048; p = 0.054), as were those in the Oncology/Infectious Diseases area (OR = 0.20; 95% CI: 0.045–0.870; p = 0.032). The model demonstrated good discriminative ability (AUC = 0.792) and excellent calibration (p = 0.91). CONCLUSIONS By combining traditional statistical approaches with innovative machine learning models, we gained a clearer understanding of which protocol features are predictive of high adherence to SPIRIT guidelines. Our findings suggest that the involvement of a multidisciplinary team, including biostatisticians, is strongly associated with better methodological quality. Protocols submitted by IRCCS institutions showed higher adherence. In contrast, trials involving special populations, particularly pediatric studies, were more likely to exhibit lower adherence, highlighting a need for targeted guidance and support in these contexts. Future analyses should include a larger sample and protocols evaluated by multiple Ethics Committees to enhance the generalizability of these findings.