An efficient clustering algorithm for partitioning Y-short tandem repeats data

Seman Ali; Bakar Zainab Abu; Isa Mohamed Nizam

doi:10.1186/1756-0500-5-557

BMC Research Notes (Oct 2012)

An efficient clustering algorithm for partitioning Y-short tandem repeats data

Seman Ali,
Bakar Zainab Abu,
Isa Mohamed Nizam

Affiliations

Seman Ali
Bakar Zainab Abu
Isa Mohamed Nizam

DOI: https://doi.org/10.1186/1756-0500-5-557
Journal volume & issue: Vol. 5, no. 1
p. 557

Abstract

Read online

Abstract Background Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). Conclusions The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear.

Published in BMC Research Notes

ISSN: 1756-0500 (Online)
Publisher: BMC
Country of publisher: United Kingdom
LCC subjects: Medicine; Science: Biology (General); Science: Science (General)
Website: https://bmcresnotes.biomedcentral.com

About the journal

Abstract

Keywords