Scientific Reports (Nov 2024)

Research and performance analysis of random forest-based feature selection algorithm in sports effectiveness evaluation

  • Yujiao Li,
  • Yingjie Mu

DOI
https://doi.org/10.1038/s41598-024-76706-1
Journal volume & issue
Vol. 14, no. 1
pp. 1 – 15

Abstract

Read online

Abstract The rapid progress in fields such as data mining and machine learning, as well as the explosive growth of sports big data, have posed new challenges to the research of sports big data. Most of the available sports data mining techniques concentrates on extracting and constructing effective features for basic sports data, which cannot be achieved simply by using data statistics. Especially in the targeted mining of sports data, traditional mining techniques still have shortcomings such as low classification accuracy and insufficient refinement. In order to solve the problem of low accuracy in traditional mining methods, the study combines the random forest algorithm with the artificial raindrop algorithm, and adopts a sports data mining method based on feature selection to achieve effective analysis of sports big data. This study is based on the evaluation method of motion effects using random forests, and uses feature extraction algorithms to study the motion effect impacts. It uses the information gain index to rank the importance of features and accurately gain the degree of influence of exercise on various indicators of the human body. Through simulation verification, the algorithm proposed by the research institute performs the best in accuracy and FI scores on the training and testing sets, with accuracies of 0.849 ± 0.021 and 0.819 ± 0.022, respectively, and F1 scores of 0.837 ± 0.020 and 0.864 ± 0.021, respectively. This indicates that the algorithm proposed by the research institute has high classification accuracy and performance proves that the Random Forest-based feature selection algorithm established in this study is superior to the existing traditional feature extraction and extraction methods in terms of both performance and accuracy. The proposal of this data analysis method has achieved accurate and efficient utilization of sports big data, which is of great significance for the development of the sports education industry.

Keywords