Scientific Reports (Dec 2024)
Dual gene set enrichment analysis (dualGSEA); an R function that enables more robust biological discovery and pre-clinical model alignment from transcriptomics data
Abstract
Abstract Gene set enrichment analysis (GSEA) tools can identify biological insights within gene expression-based studies. Although their statistical performance has been compared, the downstream biological implications that arise when choosing between the range of pairwise or single sample forms of GSEA methods remain understudied. We compare the statistical and biological results obtained from various pre-ranking methods/options for pairwise GSEA, followed by a stand-alone comparison of GSEA, single sample GSEA (ssGSEA) and gene set variation analysis (GSVA). Pairwise GSEA and fGSEA provide similar results when deployed using a range of gene pre-ranking methods. However, pairwise GSEA can overgeneralise biological enrichment, as when the most statistically significant signatures were assessed using single sample approaches, there was a complete absence of biological distinction between these groups. To avoid these issues, we developed a new dualGSEA tool, which provides users with multiple statistics and visuals to aid interpretation of results. This new tool removes the possibility of users inadvertently interpreting statistical findings as equating to biological distinction between samples within groups-of-interest. dualGSEA provides a more robust basis for discovery research, one which allows user to compare both statistical significance alongside biological distinctions in their data.
Keywords