PLoS ONE (Jan 2013)
Single nucleotide polymorphism network: a combinatorial paradigm for risk prediction.
Abstract
Risk prediction for a particular disease in a population through SNP genotyping exploits tests whose primary goal is to rank the SNPs on the basis of their disease association. This manuscript reveals a different approach of predicting the risk through network representation by using combined genotypic data (instead of a single allele/haplotype). The aim of this study is to classify diseased group and prediction of disease risk by identifying the responsible genotype. Genotypic combination is chosen from five independent loci present on platelet receptor genes P2RY1 and P2RY12. Genotype-sets constructed from combinations of genotypes served as a network input, the network architecture constituting super-nodes (e.g., case and control) and nodes representing individuals, each individual is described by a set of genotypes containing M markers (M = number of SNP). The analysis becomes further enriched when we consider a set of networks derived from the parent network. By maintaining the super-nodes identical, each network is carrying an independent combination of M-1 markers taken from M markers. For each of the network, the ratio of case specific and control specific connections vary and the ratio of super-node specific connection shows variability. This method of network has also been applied in another case-control study which includes oral cancer, precancer and control individuals to check whether it improves presentation and interpretation of data. The analyses reveal a perfect segregation between super-nodes, only a fraction of mixed state being connected to both the super-nodes (i.e. common genotype set). This kind of approach is favorable for a population to classify whether an individual with a particular genotypic combination can be in a risk group to develop disease. In addition with that we can identify the most important polymorphism whose presence or absence in a population can make a large difference in the number of case and control individuals.