Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Colleague of Computer Science and Technology, Jilin University, Changchun 130012, China
Jiali Chen
Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Colleague of Computer Science and Technology, Jilin University, Changchun 130012, China
Xuping Xie
Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Colleague of Computer Science and Technology, Jilin University, Changchun 130012, China
Sen Yang
Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Colleague of Computer Science and Technology, Jilin University, Changchun 130012, China
Wei Pang
School of Mathematical and Computer Sciences, Heriot-Watt University, Edinburgh EH14 4AS, UK
Lan Huang
Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Colleague of Computer Science and Technology, Jilin University, Changchun 130012, China
Shuangquan Zhang
Key Laboratory of Symbol Computation and Knowledge Engineering, Ministry of Education, Colleague of Computer Science and Technology, Jilin University, Changchun 130012, China
Shishun Zhao
College of Mathematics, Jilin University, Changchun 130012, China
Support vector clustering (SVC) is a boundary-based algorithm, which has several advantages over other clustering methods, including identifying clusters of arbitrary shapes and numbers. Leveraged by the high generalization ability of the large margin distribution machine (LDM) and the optimal margin distribution clustering (ODMC), we propose a new clustering method: minimum distribution for support vector clustering (MDSVC), for improving the robustness of boundary point recognition, which characterizes the optimal hypersphere by the first-order and second-order statistics and tries to minimize the mean and variance simultaneously. In addition, we further prove, theoretically, that our algorithm can obtain better generalization performance. Some instructive insights for adjusting the number of support vector points are gained. For the optimization problem of MDSVC, we propose a double coordinate descent algorithm for small and medium samples. The experimental results on both artificial and real datasets indicate that our MDSVC has a significant improvement in generalization performance compared to SVC.