Applied Artificial Intelligence (Nov 2017)

Machine Learning Applications in Baseball: A Systematic Literature Review

  • Kaan Koseler,
  • Matthew Stephan

DOI
https://doi.org/10.1080/08839514.2018.1442991
Journal volume & issue
Vol. 31, no. 9-10
pp. 745 – 763

Abstract

Read online

Statistical analysis of baseball has long been popular, albeit only in limited capacity until relatively recently. In particular, analysts can now apply machine learning algorithms to large baseball data sets to derive meaningful insights into player and team performance. In the interest of stimulating new research and serving as a go-to resource for academic and industrial analysts, we perform a systematic literature review of machine learning applications in baseball analytics. The approaches employed in literature fall mainly under three problem class umbrellas: Regression, Binary Classification, and Multiclass Classification. We categorize these approaches, provide our insights on possible future applications, and conclude with a summary of our findings. We find two algorithms dominate the literature: (1) Support Vector Machines for classification problems and (2) k-nearest neighbors for both classification and Regression problems. We postulate that recent proliferation of neural networks in general machine learning research will soon carry over into baseball analytics.