Clustering and artificial neural networks: classification of variable lengths of Helminth antigens in set of domains

Genetics and Molecular Biology. 2004;27(4):673-678 DOI 10.1590/S1415-47572004000400032


Journal Homepage

Journal Title: Genetics and Molecular Biology

ISSN: 1415-4757 (Print); 1678-4685 (Online)

Publisher: Sociedade Brasileira de Genética

Society/Institution: Sociedade Brasileira de Genética

LCC Subject Category: Science: Biology (General): Genetics

Country of publisher: Brazil

Language of fulltext: English

Full-text formats available: PDF, HTML, XML



Thiago de Souza Rodrigues
Lucila Grossi Gonçalves Pacífico
Santuza Maria Ribeiro Teixeira
Sérgio Costa Oliveira
Antônio de Pádua Braga


Peer review

Editorial Board

Instructions for authors

Time From Submission to Publication: 16 weeks


Abstract | Full Text

A new scheme for representing proteins of different lengths in number of amino acids that can be presented to a fixed number of inputs Artificial Neural Networks (ANNs) speel-out classification is described. K-Means's clustering of the new vectors with subsequent classification was then possible with the dimension reduction technique Principal Component Analysis applied previously. The new representation scheme was applied to a set of 112 antigens sequences from several parasitic helminths, selected in the National Center for Biotechnology Information and classified into fourth different groups. This bioinformatic tool permitted the establishment of a good correlation with domains that are already well characterized, regardless of the differences between the sequences that were confirmed by the PFAM database. Additionally, sequences were grouped according to their similarity, confirmed by hierarchical clustering using ClustalW.