Evidence Based Library and Information Practice (Mar 2022)
Natural Language Processing for Virtual Reference Analysis
Abstract
Objective – Chat transcript analysis can illuminate user needs by identifying common question topics, but traditional hand coding methods for topic analysis are time-consuming and poorly suited to large datasets. The research team explored the viability of automatic and natural language processing (NLP) strategies to perform rapid topic analysis on a large dataset of transcripts from a consortial chat service. Methods – The research team developed a toolchain for data processing and analysis, which incorporated targeted searching for query terms using regular expressions and natural language processing using the Python spaCy library for automatic topic analysis. Processed data was exported to Tableau for visualization. Results were compared to hand-coded data to test the accuracy of conclusions. Results – The processed data provided insights about the volume of chats originating from each participating library, the proportion of chats answered by operator groups for each library, and the percentage of chats answered by different staff types. The data also captured the top referring URLs for the service, course codes and file extensions mentioned, and query hits. Natural language processing revealed that the most common topics were related to citation, subscription databases, and finding full-text articles, which aligns with common question types identified in hand-coded transcripts. Conclusion – Compared to hand coding, automatic and NLP processing approaches have benefits in terms of the volume of data that can be analyzed and the time frame required for analysis, but they come with a trade-off in accuracy, such as false hits. Therefore, computational approaches should be used to supplement traditional hand coding methods. As NLP becomes more accurate, approaches such as these may widen avenues of insight into virtual reference and patron needs.