Mathematics (Jun 2021)
Fuzzy Clustering Methods with Rényi Relative Entropy and Cluster Size
Abstract
In the last two decades, information entropy measures have been relevantly applied in fuzzy clustering problems in order to regularize solutions by avoiding the formation of partitions with excessively overlapping clusters. Following this idea, relative entropy or divergence measures have been similarly applied, particularly to enable that kind of entropy-based regularization to also take into account, as well as interact with, cluster size variables. Particularly, since Rényi divergence generalizes several other divergence measures, its application in fuzzy clustering seems promising for devising more general and potentially more effective methods. However, previous works making use of either Rényi entropy or divergence in fuzzy clustering, respectively, have not considered cluster sizes (thus applying regularization in terms of entropy, not divergence) or employed divergence without a regularization purpose. Then, the main contribution of this work is the introduction of a new regularization term based on Rényi relative entropy between membership degrees and observation ratios per cluster to penalize overlapping solutions in fuzzy clustering analysis. Specifically, such Rényi divergence-based term is added to the variance-based Fuzzy C-means objective function when allowing cluster sizes. This then leads to the development of two new fuzzy clustering methods exhibiting Rényi divergence-based regularization, the second one extending the first by considering a Gaussian kernel metric instead of the Euclidean distance. Iterative expressions for these methods are derived through the explicit application of Lagrange multipliers. An interesting feature of these expressions is that the proposed methods seem to take advantage of a greater amount of information in the updating steps for membership degrees and observations ratios per cluster. Finally, an extensive computational study is presented showing the feasibility and comparatively good performance of the proposed methods.
Keywords