Scientific Reports (Jul 2021)
An artificial neural network approach integrating plasma proteomics and genetic data identifies PLXNA4 as a new susceptibility locus for pulmonary embolism
Abstract
Abstract Venous thromboembolism is the third common cardiovascular disease and is composed of two entities, deep vein thrombosis (DVT) and its potential fatal form, pulmonary embolism (PE). While PE is observed in ~ 40% of patients with documented DVT, there is limited biomarkers that can help identifying patients at high PE risk. To fill this need, we implemented a two hidden-layers artificial neural networks (ANN) on 376 antibodies and 19 biological traits measured in the plasma of 1388 DVT patients, with or without PE, of the MARTHA study. We used the LIME algorithm to obtain a linear approximate of the resulting ANN prediction model. As MARTHA patients were typed for genotyping DNA arrays, a genome wide association study (GWAS) was conducted on the LIME estimate. Detected single nucleotide polymorphisms (SNPs) were tested for association with PE risk in MARTHA. Main findings were replicated in the EOVT study composed of 143 PE patients and 196 DVT only patients. The derived ANN model for PE achieved an accuracy of 0.89 and 0.79 in our training and testing sets, respectively. A GWAS on the LIME approximate identified a strong statistical association peak (rs1424597: p = 5.3 × 10–7) at the PLXNA4 locus. Homozygote carriers for the rs1424597-A allele were then more frequently observed in PE than in DVT patients from the MARTHA (2% vs. 0.4%, p = 0.005) and the EOVT (3% vs. 0%, p = 0.013) studies. In a sample of 112 COVID-19 patients known to have endotheliopathy leading to acute lung injury and an increased risk of PE, decreased PLXNA4 levels were associated (p = 0.025) with worsened respiratory function. Using an original integrated proteomics and genetics strategy, we identified PLXNA4 as a new susceptibility gene for PE whose exact role now needs to be further elucidated.