IEEE Access (Jan 2022)
Decision Tree Algorithm Considering Distances Between Classes
Abstract
Decision tree algorithm (DT) is a commonly used data mining method for classification and regression. DT repeatedly divides a dataset into pure subsets based on impurity measurements such as entropy and Gini. Then relatively “pure” partitions consisting of observations with the (almost) same class are obtained. Gini index is one of the representative indices for measuring the impurity of data. However, the Gini index does not take into account distances between classes. If the distances between classes are considered when measuring impurity, the decision tree algorithm can distinguish clearly observations with different classes. To the end, a new decision tree algorithm based on Rao-Stirling index is proposed considering distances between classes. Rao-Stirling index considers distances between classes in such a way that weights more to pairs of references in more distant classes when measuring data impurity. Experimental results indicate that the proposed method is superior in terms of accuracy, implying that considering the distances between classes can help improve accuracy in DT.
Keywords