Beverages (Oct 2022)
Exploration of Data Fusion Strategies Using Principal Component Analysis and Multiple Factor Analysis
Abstract
In oenology, statistical analyses are used for descriptive purposes, mostly with separate sensory and chemistry data sets. Cases that combine them are mostly supervised, usually seeking to optimize discrimination, classification, or prediction power. Unsupervised methods are used as preliminary steps to achieving success in supervised models. However, there is potential for unsupervised methods to combine different data sets into comprehensive, information-rich models. This study detailed stepwise strategies for creating data fusion models using unsupervised techniques at different levels. Principal component analysis (PCA) and multiple factor analysis (MFA) were used to combine five data blocks (four chemistry and one sensory). The model efficiency and configurational similarity were evaluated using eigenvalues and regression vector (RV) coefficients, respectively. The MFA models were less efficient than PCA, having gradual distributions of eigenvalues across model dimensions. The MFA models were more representative than PCA, as indicated by high RV coefficients between MFA and each individual block. Therefore, MFA approaches were better suited for multi-modal data than PCA. This work approached data fusion systematically and showed the type of decisions that must be made and how to evaluate their consequences. Proper integration of data sets, instead of concatenation, is an important aspect to consider in multi-modal data fusion.
Keywords