Cost-efficient prompt engineering for unsupervised entity resolution in the product matching domain

Navapat Nananukul; Khanin Sisaengsuwanchai; Mayank Kejriwal

doi:10.1007/s44163-024-00159-8

Discover Artificial Intelligence (Aug 2024)

Cost-efficient prompt engineering for unsupervised entity resolution in the product matching domain

Navapat Nananukul,
Khanin Sisaengsuwanchai,
Mayank Kejriwal

Affiliations

Navapat Nananukul: University of Southern California, Information Sciences Institute
Khanin Sisaengsuwanchai: University of Southern California, Information Sciences Institute
Mayank Kejriwal: University of Southern California, Information Sciences Institute

DOI: https://doi.org/10.1007/s44163-024-00159-8
Journal volume & issue: Vol. 4, no. 1
pp. 1 – 21

Abstract

Read online

Abstract Entity Resolution (ER) is the problem of semi-automatically determining when two entities refer to the same underlying entity, with applications ranging from healthcare to e-commerce. Traditional ER solutions required considerable manual expertise, including domain-specific feature engineering, as well as identification and curation of training data. Recently released large language models (LLMs) provide an opportunity to make ER more seamless and domain-independent. Because of LLMs’ pre-trained knowledge, the matching step in ER can be made easier by just prompting. However, it is also well known that LLMs can pose risks, that the quality of their outputs can depend on how prompts are engineered, and that the cost of using LLMs can be significant. Unfortunately, a systematic experimental study on the effects of different prompting methods and their respective cost for solving domain-specific entity matching using LLMs, like ChatGPT, has been lacking thus far. This paper aims to address this gap by conducting such a study. We consider some relatively simple and cost-efficient ER prompt engineering methods and apply them to perform product matching on two real-world datasets widely used in the community. We select two well-known e-commerce datasets and provide extensive experimental results to show that an LLM like GPT3.5 is viable for high-performing product matching and, interestingly, that more complicated and detailed (and hence, expensive) prompting methods do not necessarily outperform simpler approaches. We provide brief discussions on qualitative and error analysis, including a study of the inter-consistency of different prompting methods to determine whether they yield stable outputs. Finally, we consider some limitations of LLMs when used as a product matcher in potential real-world e-commerce applications.

Published in Discover Artificial Intelligence

ISSN: 2731-0809 (Online)
Publisher: Springer
Country of publisher: Switzerland
LCC subjects: Language and Literature: Philology. Linguistics: Computational linguistics. Natural language processing; Science: Mathematics: Instruments and machines: Electronic computers. Computer science
Website: https://www.springer.com/journal/44163

About the journal

Abstract

Keywords