IEEE Access (Jan 2022)
Retrieval-Augmented Response Generation for Knowledge-Grounded Conversation in the Wild
Abstract
Users on the internet usually have conversations on interesting facts or topics along with diverse knowledge from the web. However, most existing knowledge-grounded conversation models consider only a single document regarding the topic of a conversation. The recently proposed retrieval-augmented models generate a response based on multiple documents; however, they ignore the given topic and use only the local context of the conversation. To this end, we introduce a novel retrieval-augmented response generation model that retrieves an appropriate range of documents relevant to both the topic and local context of a conversation and uses them for generating a knowledge-grounded response. Our model first accepts both topic words extracted from the whole conversation and the tokens before the response to yield multiple representations. It then chooses representations of the first N token and ones of keywords from the conversation and document encoders and compares the two groups of representation from the conversation with those groups of the document, respectively. For training, we introduce a new data-weighting scheme to encourage the model to produce knowledge-grounded responses without ground truth knowledge. Both automatic and human evaluation results with a large-scale dataset show that our models can generate more knowledgeable, diverse, and relevant responses compared to the state-of-the-art models.
Keywords