Proceedings of the International Conference on Applied Innovations in IT (Nov 2023)

Scrutinised and Compared: HVG Identification Methods in Terms of Common Metrics

  • Nadiia Kasianchuk,
  • Yevhenii Kukuruza,
  • Vladyslav Ostash,
  • Anastasiia Boshtova,
  • Dmytro Tsvyk,
  • Matvii Mykhailichenko

DOI
https://doi.org/10.25673/112994
Journal volume & issue
Vol. 11, no. 2
pp. 59 – 66

Abstract

Read online

Highly variable gene (HVG) identification plays a critical role in unravelling gene expression patterns and understanding cellular heterogeneity in single-cell RNA-sequencing (scRNA-seq) data. A plethora of software packages have been developed for this purpose; however, their comparative performance is yet to be explored. This study addresses this gap by independently evaluating 22 methods from 9 different packages to provide a comprehensive assessment of the HVG identification methods. For such purpose it was deemed necessary to employ a set of common metrics, namely overlap with highly and lowly expressed genes, runtime, and clustering indices (e.g., Calinski-Harabasz, Davies-Bouldin, and ROGUE). The results reveal substantial disparities not only between different methods but also in the performance of a single method across diverse datasets. That is to say, the dimensionality of the provided data, spike-ins, and background noise are some of the key factors influencing the results. These variations underscore the significant impact of dataset characteristics on analysis outcomes. Therefore, consistent consideration of data nature is imperative. The study emphasises the urgent need for a standardised, data-driven assessment framework to ensure reliable and effective scRNA-seq analyses. This work serves as a valuable resource for both scRNA-seq software developers and experimental researchers seeking optimal methods for their investigations.

Keywords