IEEE Access (Jan 2024)

PCaLDI: Explainable Similarity and Distance Metrics Using Principal Component Analysis Loadings for Feature Importance

  • Takafumi Nakanishi

DOI
https://doi.org/10.1109/ACCESS.2024.3387547
Journal volume & issue
Vol. 12
pp. 52623 – 52640

Abstract

Read online

In the evolving landscape of interpretable machine learning (ML) and explainable artificial intelligence, transparent and comprehensible ML models are crucial for data-driven decision-making. Traditional approaches have limitations in distinguishing whether the observed importance of features in principal component analysis (PCA)-transformed similarity metrics is due to the intrinsic characteristics of the data or artifacts introduced by the PCA. This ambiguity hampers the accurate interpretation of feature contributions to similarity and distance metrics, which are fundamental to data-analysis techniques. To address these challenges, I introduce the novel PCA loading-dependent importance (PCaLDI), which elucidates the similarity and distance metrics by synergistically leveraging the strengths of PCA loadings and permutation feature importance. PCaLDI innovatively utilizes PCA loadings to prioritize the most influential features, streamlining the assessment of feature importance. This approach provides clearer insights into the contributions of the features and reduces the computational inefficiencies inherent to traditional methods. Importantly, PCaLDI uniquely clarifies the contributions of individual features to similarity metrics within the PCA-transformed space, distinguishing between the effects attributed to PCA and genuine influence of features on the similarity measures. This distinction is pivotal for accurately understanding the data structure and making informed decisions. Moreover, the versatility of PCaLDI extends to any data format compatible with PCA, highlighting its broad applicability and utility across data types. Comprehensive experiments and comparisons with baseline methods demonstrate that PCaLDI exhibits high effectiveness and efficiency, offering rapid and accurate assessments of feature importance with substantial reduced computational demands.

Keywords