Scientific Reports (Mar 2025)

Analysis of the 50-mile ultramarathon distance using a predictive XGBoost model

  • Jonas Turnwald,
  • David Valero,
  • Pedro Forte,
  • Katja Weiss,
  • Elias Villiger,
  • Mabliny Thuany,
  • Volker Scheer,
  • Matthias Wilhelm,
  • Marilia Andrade,
  • Ivan Cuk,
  • Pantelis T. Nikolaidis,
  • Beat Knechtle

DOI
https://doi.org/10.1038/s41598-025-92581-w
Journal volume & issue
Vol. 15, no. 1
pp. 1 – 11

Abstract

Read online

Abstract Although the 50-mile ultramarathon is one of the most common race distances, it has received little scientific attention. The objective of this study was to assess how an athlete’s age group, sex, nationality, and the race location, affect race speed. Utilizing a dataset with ultramarathon races from 1863 to 2022, a machine learning model based on the XGBoost algorithm was developed to predict the race speed based on the aforementioned variables. Model explainability tools, including model features relative importances and prediction distribution plots were then used to investigate how each feature affects the predicted race speed. The most important features, with respect to the predictive power of the XGBoost model, were the location of the race and the athlete’s gender. The top 3 countries with the fastest predicted median race speeds were Slovenia, New Zealand, and Bulgaria for nationality and New Zealand, Croatia, and Serbia for the race location. The fastest median race speed was predicted for the age group 20–24 years, but a marked age-related performance decline only became apparent from the age group 40–44 years onward. Model predictions for male athletes were faster than for female athletes. This study offers insights into factors influencing race speed in 50-mile ultramarathons, which may be beneficial for athletes, coaches, and race organizers. The identification of nationalities and event countries with fast race speeds provides a foundation for further exploration in the field of ultramarathon events.