IEEE Access (Jan 2020)
A Comparison Framework of Machine Learning Algorithms for Mixed-Type Variables Datasets: A Case Study on Tire-Performances Prediction
Abstract
Many engineering applications in the automotive, aeronautic, rubber, mechanics, and manufacturing industries collect multiple datasets measuring physical relations between input variables and performances for modeling purposes. The challenge relies on that such data is often highly dimensional, non-linear and contain mixed variables, i.e., numerical and categorical features, requiring specific algorithms and encoding schemes to perform regression task efficiently. Moreover, defining an appropriated similarity criterion for mixed-type data is a non-trivial task, especially when it is meant to be used in regression problems. This paper discusses the use of different machine learning algorithms for regression problems, involving mixed-type variables across multiple datasets. We use tire-related datasets as a case study to perform a rigorous, statistically founded comparison of different machine learning algorithms with encoding schemes to handle mixed variables in the prediction of tire-performances across multiple tire-related datasets. Friedman's statistic and Nemenyi post-hoc tests are used to test the significance of performance differences between techniques and encoding strategies. Our contributions come as a series of recommendations for handling efficiently mixed-type variables while achieving high performances on regression tasks over multiple datasets. Furthermore, we provide a flexible and efficient similarity function between tires useful for tire comparison, prediction, and retrieval tasks.
Keywords