BMC Bioinformatics (Jun 2024)

penalizedclr: an R package for penalized conditional logistic regression for integration of multiple omics layers

  • Vera Djordjilović,
  • Erica Ponzi,
  • Therese Haugdahl Nøst,
  • Magne Thoresen

DOI
https://doi.org/10.1186/s12859-024-05850-2
Journal volume & issue
Vol. 25, no. 1
pp. 1 – 14

Abstract

Read online

Abstract Background The matched case–control design, up until recently mostly pertinent to epidemiological studies, is becoming customary in biomedical applications as well. For instance, in omics studies, it is quite common to compare cancer and healthy tissue from the same patient. Furthermore, researchers today routinely collect data from various and variable sources that they wish to relate to the case–control status. This highlights the need to develop and implement statistical methods that can take these tendencies into account. Results We present an R package penalizedclr, that provides an implementation of the penalized conditional logistic regression model for analyzing matched case–control studies. It allows for different penalties for different blocks of covariates, and it is therefore particularly useful in the presence of multi-source omics data. Both L1 and L2 penalties are implemented. Additionally, the package implements stability selection for variable selection in the considered regression model. Conclusions The proposed method fills a gap in the available software for fitting high-dimensional conditional logistic regression models accounting for the matched design and block structure of predictors/features. The output consists of a set of selected variables that are significantly associated with case–control status. These variables can then be investigated in terms of functional interpretation or validation in further, more targeted studies.

Keywords