Mendel (Jun 2024)
CLaRA: Cost-effective LLMs Function Calling Based on Vector Database
Abstract
Since their introduction, function calls have become a widely used feature within the OpenAI API ecosystem. They reliably connect GPT’s capabilities with external tools and APIs, and they quickly found their way into the other LLMs. The challenge one can encounter is the number of tokens consumed per request since each definition of each function is always sent as an input. We propose a simple solution to effectively decrease the number of tokens by sending only the function corresponding to the user question. The solution is based on saving the functions to the vector database, where we use the similarity score to pick only the functions that need to be sent. We have benchmarked that our solution can decrease the average prompt token consumption by 210% and the average prompt (input) price by 244% vs the default function call. Our solution is not limited to specific LLMs. It can be integrated with any LLM that supports function calls, making it a versatile tool for reducing token consumption. This means that even cheaper models with a high volume of functions can benefit from our solution.
Keywords