Nongye tushu qingbao xuebao (May 2023)

Towards Known Unknowns: GPT Large Language Models Empower Human-Centered Information Retrieval

  • SHOU Jianqi

DOI
https://doi.org/10.13998/j.cnki.issn1002-1248.23-0386
Journal volume & issue
Vol. 35, no. 5
pp. 16 – 26

Abstract

Read online

[Purpose/Significance] The foundation of public library services lies within information retrieval (IR), an area that has a profound societal impact through activities such as digital resource integration and the advancement of societal equity. Current methodologies focus primarily on classical keyword-based Online Public Access Catalog (OPAC)-like top-down retrieval and large language model (LLM) based point-to-point retrieval. Unfortunately, these approaches individually fail to strike a balance between flexibility and reliability, hindering the evolution towards user-centric IR systems. Consequently, there is an urgent need for an innovative retrieval strategy that fosters a human-centered IR paradigm. [Method/Process] Contrary to the prevalent school of thought that advocates for the complete substitution of classical OPAC-like approach with LLM methods such as GPT, we put forward a groundbreaking proposal that synergizes the merits of both strategies. This proposition represents the inaugural effort of this kind within the scholarly community of public information service. We introduce the adaptive literature retrieval framework (ALRF), an innovative approach grounded in the principles of cognitive science, addressing the critical user challenge in retrieval - the pursuit of known unknown knowledge (KUK). KUK originates from a user's explicit understanding of the desired outcome, without comprehending the associated domain-specific terminology, thereby lacking the necessary entry point for a keyword-based search. ALRF's novel two-stage workflow caters specifically to such situations: (i) users can identify target keywords or keywords at a more abstract level by entering descriptions in natural language, thus implementing a bottom-up strategy; (ii) utilizing these extracted keywords, users can then conduct a top-down search. ALRF accommodates LLMs such as ChatGPT, GPT-4, and ERNIE Bot. The platform's effectiveness in retrieving literature from diverse fields such as science and engineering, biology and medicine, literature and sociology was carefully evaluated. [Results/Conclusions] The ALRF significantly outperforms standard methods, i.e., LLM-based retrieval service and OPAC-like retrieval service, in terms of both flexibility and reliability. This holds true for tasks involving keyword abstraction (i.e., identifying keywords at a higher level of abstraction in the target domain) and property extraction (i.e., locating keywords with specific attributes but at the same abstraction level as the target domain). Consequently, it addresses the pressing need for KUK retrieval, signifying that ALRF has showcased initial potential to cater to the diverse and personalized retrieval requirements of users. This suggests that ALRF could potentially revolutionize public information services by placing humans at the center of its operation. Regrettably, a current hindrance to the wider adoption of ALRF in public IR in China is the pace of development of powerful LLMs by Chinese corporations. We recommend that researchers remain abreast of such advancements to be cognizant of the realistic possibilities and limitations in real-world applications.

Keywords