Proceedings of the XXth Conference of Open Innovations Association FRUCT (Nov 2024)
Reducing the Long Tail Effect in E-Commerce through Self-Attention
Abstract
The long tail of search queries is a well-known issue that complicate the creation of efficient reverse indexes based on string-based representations of queries. Various techniques have been employed to reduce the diversity of search terms, such as proximity searching, fuzzy hashing, and collaborative filtering. Nevertheless, these approaches often struggle to handle domain- specific entities such as brand names and product characteristics, which are essential for effective product search in online mar- ketplaces. This study presents an approach that utilizes positional weight- ing to assess the significance of search terms based on their influence within the query context. The proposed technique takes into account domain-specific elements to more precisely determine the relevance of each term. By implementing this new C2T (context-dependent token) model, a 48% reduction in query diversity was achieved, as measured by the perplexity metric.
Keywords