BMC Bioinformatics (Aug 2024)

Denoiseit: denoising gene expression data using rank based isolation trees

  • Jaemin Jeon,
  • Youjeong Suk,
  • Sang Cheol Kim,
  • Hye-Yeong Jo,
  • Kwangsoo Kim,
  • Inuk Jung

DOI
https://doi.org/10.1186/s12859-024-05899-z
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 19

Abstract

Read online

Abstract Background Selecting informative genes or eliminating uninformative ones before any downstream gene expression analysis is a standard task with great impact on the results. A carefully curated gene set significantly enhances the likelihood of identifying meaningful biomarkers. Method In contrast to the conventional forward gene search methods that focus on selecting highly informative genes, we propose a backward search method, DenoiseIt, that aims to remove potential outlier genes yielding a robust gene set with reduced noise. The gene set constructed by DenoiseIt is expected to capture biologically significant genes while pruning irrelevant ones to the greatest extent possible. Therefore, it also enhances the quality of downstream comparative gene expression analysis. DenoiseIt utilizes non-negative matrix factorization in conjunction with isolation forests to identify outlier rank features and remove their associated genes. Results DenoiseIt was applied to both bulk and single-cell RNA-seq data collected from TCGA and a COVID-19 cohort to show that it proficiently identified and removed genes exhibiting expression anomalies confined to specific samples rather than a known group. DenoiseIt also showed to reduce the level of technical noise while preserving a higher proportion of biologically relevant genes compared to existing methods. The DenoiseIt Software is publicly available on GitHub at https://github.com/cobi-git/DenoiseIt

Keywords