Computational and Structural Biotechnology Journal (Dec 2024)

3D clustering of gene expression data from systemic autoinflammatory diseases using self-organizing maps (Clust3D)

  • Orestis D. Papagiannopoulos,
  • Vasileios C. Pezoulas,
  • Costas Papaloukas,
  • Dimitrios I. Fotiadis

Journal volume & issue
Vol. 23
pp. 2152 – 2162

Abstract

Read online

Background and objective: Systemic autoinflammatory diseases (SAIDs) are characterized by widespread inflammation, but for most of them there is a lack of specific biomarkers for accurate diagnosis. Although a number of machine learning algorithms have been used to analyze SAID datasets, aiding in the discovery of novel biomarkers, there is a growing recognition of the importance of SAID timeseries clustering, as it can capture the temporal dynamics of gene expression patterns. Methodology: This paper proposes a novel clustering methodology to efficiently associate three-dimensional data. The algorithm utilizes competitive learning to create a self-organizing neural network and adjust neuron positions in time-dependent and high dimensional feature space in order to assign them as clustering centers. The quantitative evaluation of the clustering was based on well-known clustering indices. Furthermore, a differential expression analysis and classification pipeline was employed to assess the capability of the proposed methodology to extract more accurate pathway-specific genes from its clusters. For that, a comparative analysis was also conducted against a heuristic timeseries clustering method. Results: The proposed methodology achieved better overall clustering indices scores and classification metrics using genes derived from its clusters. Notable cases include a threefold increase in the Calinski-Harabasz clustering index, a twofold improvement in the Davies–Bouldin clustering index and a ∼60% increase in the classification specificity score. Conclusion: A novel clustering methodology was developed and applied on several gene expression timeseries datasets from systemic autoinflammatory diseases, and its ability to efficiently produce well separated clusters compared to existing heuristic methods was demonstrated.

Keywords