Latin-American Journal of Computing (Nov 2016)

Setting a generalized functional linear model (GFLM) for the classification of different types of cancer

  • Miguel Flores,
  • Guido Saltos,
  • Sergio Castillo Páez

Journal volume & issue
Vol. 3, no. 2
pp. 41 – 48

Abstract

Read online

This work aims to classify the DNA sequences of healthy and malignant cancer respectively. For this, supervised and unsupervised classification methods from a functional context are used; i.e. each strand of DNA is an observation. The observations are discretized, for that reason different ways to represent these observations with functions are evaluated. In addition, an exploratory study is done: estimating the mean and variance of each functional type of cancer. For the unsupervised classification method, hierarchical clustering with different measures of functional distance is used. On the other hand, for the supervised classification method, a functional generalized linear model is used. For this model the first and second derivatives are used which are included as discriminating variables. It has been verified that one of the advantages of working in the functional context is to obtain a model to correctly classify cancers by 100%. For the implementation of the methods it has been used the fda.usc R package that includes all the techniques of functional data analysis used in this work. In addition, some that have been developed in recent decades. For more details of these techniques can be consulted Ramsay, J. O. and Silverman (2005) and Ferraty et al. (2006).

Keywords