IEEE Access (Jan 2024)
Topological Risk-Landscape in Metric-Free Categorical Database
Abstract
The Entropy-based Categorical Exploratory Data Analysis (CEDA) paradigm is elaborately refined to algorithmically explore the intricate high-order directional associative relational patterns within the heterogeneous chronical disease dynamics captured by Behavioral Risk Factor Surveillance System (BRFSS) database. Operating on this imbalanced categorical dataset represented fully by its metric-free high-dimensional histogram, our algorithms conduct data-driven computations to investigate chronic disease mechanisms across four sub-populations along the age-axis, culminating in comprehensive systemic understandings. Upon this categorical data-world, CEDA first recognizes the category-oriented 1D histogram as the simplest form of a piece of explainable information. Then, utilizing Kolmogorov’s randomness-proper-based reliability check, CEDA identifies and confirms collectives of 1D histograms as major feature-categories of varying orders within each sub-population. These confirmed major feature-categories’ binary memberships are then arranged into a subject-vs-feature-category bipartite network heatmap, revealing serial horizontal and vertical blocks framed by clusters of similar subjects characterized by individual-risk-landscapes (IRL) against clusters of structurally dependent major feature-categories. Based on such block-series, sub-population-specific disease mechanisms emerge as collective high-order interacting effects, elucidating directional associative relationships from study subjects’ topological neighborhoods to response-categories. Notably, the topological individual-risk-landscape offers profound insights into complex system dynamics and simultaneously exposes atypical subjects as explainable errors across all Machine Learning classifiers.
Keywords