Mathematics (Dec 2024)
Rootlets Hierarchical Principal Component Analysis for Revealing Nested Dependencies in Hierarchical Data
Abstract
Hierarchical clustering analysis (HCA) is a widely used unsupervised learning method. Limitations of HCA, however, include imposing an artificial hierarchy onto non-hierarchical data and fixed two-way mergers at every level. To address this, the current work describes a novel rootlets hierarchical principal component analysis (hPCA). This method extends typical hPCA using multivariate statistics to construct adaptive multiway mergers and Riemannian geometry to visualize nested dependencies. The rootlets hPCA algorithm and its projection onto the Poincaré disk are presented as examples of this extended framework. The algorithm constructs high-dimensional mergers using a single parameter, interpreted as a p-value. It decomposes a similarity matrix from GL(m, ℝ) using a sequence of rotations from SO(k), k m. Analysis shows that the rootlets algorithm limits the number of distinct eigenvalues for any merger. Nested clusters of arbitrary size but equal correlations are constructed and merged using their leading principal components. The visualization method then maps elements of SO(k) onto a low-dimensional hyperbolic manifold, the Poincaré disk. Rootlets hPCA was validated using simulated datasets with known hierarchical structure, and a neuroimaging dataset with an unknown hierarchy. Experiments demonstrate that rootlets hPCA accurately reconstructs known hierarchies and, unlike HCA, does not impose a hierarchy on data.
Keywords