Genetics and Molecular Biology (2004-01-01)

Clustering and artificial neural networks: classification of variable lengths of Helminth antigens in set of domains

  • Thiago de Souza Rodrigues,
  • Lucila Grossi Gonçalves Pacífico,
  • Santuza Maria Ribeiro Teixeira,
  • Sérgio Costa Oliveira,
  • Antônio de Pádua Braga

Journal volume & issue
Vol. 27, no. 4
pp. 673 – 678


Read online

A new scheme for representing proteins of different lengths in number of amino acids that can be presented to a fixed number of inputs Artificial Neural Networks (ANNs) speel-out classification is described. K-Means's clustering of the new vectors with subsequent classification was then possible with the dimension reduction technique Principal Component Analysis applied previously. The new representation scheme was applied to a set of 112 antigens sequences from several parasitic helminths, selected in the National Center for Biotechnology Information and classified into fourth different groups. This bioinformatic tool permitted the establishment of a good correlation with domains that are already well characterized, regardless of the differences between the sequences that were confirmed by the PFAM database. Additionally, sequences were grouped according to their similarity, confirmed by hierarchical clustering using ClustalW.