PCaLDI: Explainable Similarity and Distance Metrics Using Principal Component Analysis Loadings for Feature Importance

Takafumi Nakanishi

doi:10.1109/ACCESS.2024.3387547

IEEE Access (Jan 2024)

PCaLDI: Explainable Similarity and Distance Metrics Using Principal Component Analysis Loadings for Feature Importance

Takafumi Nakanishi

Affiliations

Takafumi Nakanishi: ORCiD; Department of Data Science, Musashino University, Koto-ku, Tokyo, Japan

DOI: https://doi.org/10.1109/ACCESS.2024.3387547
Journal volume & issue: Vol. 12
pp. 52623 – 52640

Abstract

Read online

In the evolving landscape of interpretable machine learning (ML) and explainable artificial intelligence, transparent and comprehensible ML models are crucial for data-driven decision-making. Traditional approaches have limitations in distinguishing whether the observed importance of features in principal component analysis (PCA)-transformed similarity metrics is due to the intrinsic characteristics of the data or artifacts introduced by the PCA. This ambiguity hampers the accurate interpretation of feature contributions to similarity and distance metrics, which are fundamental to data-analysis techniques. To address these challenges, I introduce the novel PCA loading-dependent importance (PCaLDI), which elucidates the similarity and distance metrics by synergistically leveraging the strengths of PCA loadings and permutation feature importance. PCaLDI innovatively utilizes PCA loadings to prioritize the most influential features, streamlining the assessment of feature importance. This approach provides clearer insights into the contributions of the features and reduces the computational inefficiencies inherent to traditional methods. Importantly, PCaLDI uniquely clarifies the contributions of individual features to similarity metrics within the PCA-transformed space, distinguishing between the effects attributed to PCA and genuine influence of features on the similarity measures. This distinction is pivotal for accurately understanding the data structure and making informed decisions. Moreover, the versatility of PCaLDI extends to any data format compatible with PCA, highlighting its broad applicability and utility across data types. Comprehensive experiments and comparisons with baseline methods demonstrate that PCaLDI exhibits high effectiveness and efficiency, offering rapid and accurate assessments of feature importance with substantial reduced computational demands.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords