Automatic Clustering of e-Commerce Product Description

Haytham SALEEM Al-SARRAYRIH; Lars KNIPPING; Carmen PETCU

Journal of Applied Computer Science & Mathematics (Jan 2012)

Automatic Clustering of e-Commerce Product Description

Haytham SALEEM Al-SARRAYRIH,
Lars KNIPPING,
Carmen PETCU

Affiliations

Haytham SALEEM Al-SARRAYRIH
Lars KNIPPING
Carmen PETCU

Journal volume & issue: Vol. 6, no. 13
pp. 48 – 60

Abstract

Read online

Resolving the issue of storing large amounts of digital information is a challenge, searching for a certain object within a tremendous amount of data is like looking for a needle in a haystack. The increase in size and diversity of stored data makes the retrieval of the information needed more and more difficult. This research describes the use of clustering techniques and mathematical models in the field of information retrieval when dealing with text documents. In this study, the traditional clustering and clustering extended by LSA are compared by applying them on the preprocessed text corpus using the weighted centroid clustering algorithm and the cosine similarity to measure the documents' correlation. LSA is assumed to improve the clustering by bringing related words closer in a conceptual space. It is deduced that the clustering depends on the document representation and the similarity measure used. When dealing with short documents, LSA does not bring yield improved results compared to the traditional clustering techniques. The recall value is nevertheless higher because of the increased number of related documents returned. However, the results are less accurate than with traditional techniques.

Published in Journal of Applied Computer Science & Mathematics

ISSN: 2066-4273 (Print); 2066-3129 (Online)
Publisher: Stefan cel Mare University of Suceava
Country of publisher: Romania
LCC subjects: Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: http://jacs.usv.ro/

About the journal

Abstract

Keywords