BMC Bioinformatics (Feb 2012)

<monospace>ICGE</monospace>: an <monospace>R</monospace> package for detecting relevant clusters and atypical units in gene expression

  • Irigoien Itziar,
  • Sierra Basilio,
  • Arenas Concepcion

DOI
https://doi.org/10.1186/1471-2105-13-30
Journal volume & issue
Vol. 13, no. 1
p. 30

Abstract

Read online

Abstract Background Gene expression technologies have opened up new ways to diagnose and treat cancer and other diseases. Clustering algorithms are a useful approach with which to analyze genome expression data. They attempt to partition the genes into groups exhibiting similar patterns of variation in expression level. An important problem associated with gene classification is to discern whether the clustering process can find a relevant partition as well as the identification of new genes classes. There are two key aspects to classification: the estimation of the number of clusters, and the decision as to whether a new unit (gene, tumor sample...) belongs to one of these previously identified clusters or to a new group. Results ICGE is a user-friendly R package which provides many functions related to this problem: identify the number of clusters using mixed variables, usually found by applied biomedical researchers; detect whether the data have a cluster structure; identify whether a new unit belongs to one of the pre-identified clusters or to a novel group, and classify new units into the corresponding cluster. The functions in the ICGE package are accompanied by help files and easy examples to facilitate its use. Conclusions We demonstrate the utility of ICGE by analyzing simulated and real data sets. The results show that ICGE could be very useful to a broad research community.