IEEE Access (Jan 2024)

A Hybrid Framework of Transformer Encoder and Residential Conventional for Cardiovascular Disease Recognition Using Heart Sounds

  • Riyadh M. Al-Tam,
  • Aymen M. Al-Hejri,
  • Ebrahim Naji,
  • Fatma A. Hashim,
  • Sultan S. Alshamrani,
  • Abdullah Alshehri,
  • Sachin M. Narangale

DOI
https://doi.org/10.1109/ACCESS.2024.3451660
Journal volume & issue
Vol. 12
pp. 123099 – 123113

Abstract

Read online

Valvular heart disease (VHD) is one of the primary causes of cardiovascular illnesses with high mortality rates worldwide. Early detection of VHD enables optimal treatment and stops the onset of heart problems. Even with the capability of diagnosing VHD, many misdiagnosed cases have existed. Therefore, in this work, four AI models are proposed to effectively diagnose heart sounds generated from phonocardiography (PCG) recordings, namely VGG16, ResNet50, Transformer Encoder, and a hybrid framework based on ResNet50 and the Transformer Encoder with Multiple Layer Perceptron (MLP). Two benchmark datasets are involved, namely Yaseen and PhysioNet 2016 challenge datasets. The suggested AI models can classify heart sounds in two scenarios: Multi-class scenario (aortic stenosis, mitral stenosis, mitral regurgitation, mitral valve prolapse, and normal) and binary classification (normal and abnormal). Mel-frequency cepstral coefficients (MFCCs) features are extracted from heart sound files, where a 22050-sample rate is used with 12 seconds as the maximum clip duration for each audio file. Additionally, a pre-processing technique is applied using the z-score normalization and Discrete cosine transform (DCT) algorithms. Besides, noise, stretching, and shifting are used to augment sound data. By using a 5-fold cross-validation technique, the proposed hybrid framework has outperformed other models using the Yaseen dataset, reaching an average of 99.80% for accuracy and AUC, respectively. On the other hand, this model can achieve averages of 97.40% and 96.40% for accuracy and AUC, respectively, when the PhysioNet 2016 challenge dataset is used. In addition, this model achieves 100.0% accuracy and AUC after combining all abnormalities in the Yaseen dataset into one abnormal group alongside its normal counterpart for a binary classification purpose. The experiment’s findings indicate that the proposed hybrid AI model can significantly recognize heart sound recordings. This ability is essential for the domain of experts to advise the most effective treatment and prompt further investigation into suspicious cases.

Keywords