PLoS ONE (Jan 2018)
Sound Colless-like balance indices for multifurcating trees.
Abstract
The Colless index is one of the most popular and natural balance indices for bifurcating phylogenetic trees, but it makes no sense for multifurcating trees. In this paper we propose a family of Colless-like balance indices [Formula: see text] that generalize the Colless index to multifurcating phylogenetic trees. Each [Formula: see text] is determined by the choice of a dissimilarity D and a weight function [Formula: see text]. A balance index is sound when the most balanced phylogenetic trees according to it are exactly the fully symmetric ones. Unfortunately, not every Colless-like balance index is sound in this sense. We prove then that taking f(n) = ln(n + e) or f(n) = en as weight functions, the resulting index [Formula: see text] is sound for every dissimilarity D. Next, for each one of these two functions f and for three popular dissimilarities D (the variance, the standard deviation, and the mean deviation from the median), we find the most unbalanced phylogenetic trees according to [Formula: see text] with any given number n of leaves. The results show that the growth pace of the function f influences the notion of "balance" measured by the indices it defines. Finally, we introduce our R package "CollessLike," which, among other functionalities, allows the computation of Colless-like indices of trees and their comparison to their distribution under Chen-Ford-Winkel's α-γ-model for multifurcating phylogenetic trees. As an application, we show that the trees in TreeBASE do not seem to follow either the uniform model for multifurcating trees or the α-γ-model, for any values of α and γ.