Mathematics (Oct 2021)
Attribute Selecting in Tree-Augmented Naive Bayes by Cross Validation Risk Minimization
Abstract
As an important improvement to naive Bayes, Tree-Augmented Naive Bayes (TAN) exhibits excellent classification performance and efficiency since it allows that every attribute depends on at most one other attribute in addition to the class variable. However, its performance might be lowered as some attributes might be redundant. In this paper, we propose an attribute Selective Tree-Augmented Naive Bayes (STAN) algorithm which builds a sequence of approximate models each involving only the top certain attributes and searches the model to minimize the cross validation risk. Five different approaches to ranking the attributes have been explored. As the models can be evaluated simultaneously in one pass learning through the data, it is efficient and can avoid local optima in the model space. The extensive experiments on 70 UCI data sets demonstrated that STAN achieves superior performance while maintaining the efficiency and simplicity.
Keywords