BioData Mining (May 2024)

The biomedical knowledge graph of symptom phenotype in coronary artery plaque: machine learning-based analysis of real-world clinical data

  • Jia-Ming Huan,
  • Xiao-Jie Wang,
  • Yuan Li,
  • Shi-Jun Zhang,
  • Yuan-Long Hu,
  • Yun-Lun Li

DOI
https://doi.org/10.1186/s13040-024-00365-1
Journal volume & issue
Vol. 17, no. 1
pp. 1 – 17

Abstract

Read online

Abstract A knowledge graph can effectively showcase the essential characteristics of data and is increasingly emerging as a significant means of integrating information in the field of artificial intelligence. Coronary artery plaque represents a significant etiology of cardiovascular events, posing a diagnostic challenge for clinicians who are confronted with a multitude of nonspecific symptoms. To visualize the hierarchical relationship network graph of the molecular mechanisms underlying plaque properties and symptom phenotypes, patient symptomatology was extracted from electronic health record data from real-world clinical settings. Phenotypic networks were constructed utilizing clinical data and protein‒protein interaction networks. Machine learning techniques, including convolutional neural networks, Dijkstra's algorithm, and gene ontology semantic similarity, were employed to quantify clinical and biological features within the network. The resulting features were then utilized to train a K-nearest neighbor model, yielding 23 symptoms, 41 association rules, and 61 hub genes across the three types of plaques studied, achieving an area under the curve of 92.5%. Weighted correlation network analysis and pathway enrichment were subsequently utilized to identify lipid status-related genes and inflammation-associated pathways that could help explain the differences in plaque properties. To confirm the validity of the network graph model, we conducted coexpression analysis of the hub genes to evaluate their potential diagnostic value. Additionally, we investigated immune cell infiltration, examined the correlations between hub genes and immune cells, and validated the reliability of the identified biological pathways. By integrating clinical data and molecular network information, this biomedical knowledge graph model effectively elucidated the potential molecular mechanisms that collude symptoms, diseases, and molecules.

Keywords