IEEE Access (Jan 2024)

VAE-Driven Multimodal Fusion for Early Cardiac Disease Detection

  • Junxin Wang,
  • Juanen Li,
  • Rui Wang,
  • Xinqi Zhou

DOI
https://doi.org/10.1109/ACCESS.2024.3420444
Journal volume & issue
Vol. 12
pp. 90535 – 90551

Abstract

Read online

This study presents a novel multimodal deep learning model designed to improve early detection and diagnosis of chronic cardiac conditions such as Severe Left Ventricular Hypertrophy (SLVH) and Dilated Left Ventricle (DLV). Leveraging nearly 70,000 medical records from Columbia University Irving Medical Center, the model integrates early-stage CXR structured data and chest X-ray imagery, employing SMOTE to correct data imbalances. The model utilizes the pre-trained EfficientNetB3 for image feature extraction, enhanced with SE-Block and CBAM attention mechanisms, while Transformer Encoder layers enrich the structured data representation. Notably, it incorporates Variational Autoencoders (VAEs) to encode both types of data into a cohesive low-dimensional latent space, facilitating an innovative multimodal fusion for cardiac disease risk classification. Ablation studies validate the essential role of each component, with VAE-driven feature fusion significantly boosting accuracy and stability (increasing by 5.43% for SLVH and 14.13% for DLV datasets). The model outperforms existing advanced multimodal frameworks, showing a marked improvement in accuracy, recall, precision, and F1 scores. Specifically, it surpasses the leading CLIP model by 1.56% and 0.68% in accuracy for 90—270 day SLVH and DLV datasets, respectively. High AUC values across various disease stages highlight the model’s robustness, demonstrating consistently superior performance in disease progression prediction. These results underscore the potential of integrating multimodal data with advanced deep learning techniques to significantly enhance the diagnostic capabilities of medical tools, paving the way for better early cardiac disease interventions and patient outcomes.

Keywords