A comparison of chatbot platforms with the state-of-the-art sentence BERT for answering online student FAQs

Kevin Peyton; Saritha Unnikrishnan

Results in Engineering (Mar 2023)

A comparison of chatbot platforms with the state-of-the-art sentence BERT for answering online student FAQs

Kevin Peyton,
Saritha Unnikrishnan

Affiliations

Kevin Peyton: Faculty of Engineering and Design, Atlantic Technological University, Sligo, Ireland; Corresponding author.
Saritha Unnikrishnan: Faculty of Engineering and Design, Atlantic Technological University, Sligo, Ireland; Mathematical Modelling and Intelligent Systems for Health and Environment (MISHE), Atlantic Technological University, Sligo, Ireland

Journal volume & issue: Vol. 17
p. 100856

Abstract

Read online

Online learning enables academic institutions to accommodate increased student numbers at scale. With this scale comes high demands on support staff for help in dealing with general questions relating to qualifications and registration. Chatbots that implement Frequently Asked Questions (FAQs) can be a valuable part in this support process. A chatbot can provide constant availability in answering common questions, allowing support staff to engage on higher value one-to-one communication with prospective students. A variety of approaches can be used to create these chatbots including vertical platforms, frameworks, and direct model implementation. A comparative analysis is required to establish which approach provides the most accuracy for an existing, available dataset.This paper compares intent classification results of two popular chatbot frameworks to a state-of-the-art Sentence BERT (SBERT) model that can be used to build a robust chatbot. A methodology is outlined which includes the preparation of a university FAQ dataset into a chatbot friendly format for upload and training of each implementation. Results obtained from the framework-based implementations are generated using their published Application Programming Interfaces (APIs). This enables intent classification using testing phrases and finally comparison of F1 scores.Using ten intents comprising 284 training phrases and 85 testing phrases it was found that a SBERT model outperformed all others with an F1-score of 0.99. Initial comparison with the literature suggests that the F1-scores obtained for Google Dialogflow (0.96) and Microsoft QnA Maker (0.95) are very similar to other benchmarking exercises where NLU (Natural Language Understanding) has been compared.

Published in Results in Engineering

ISSN: 2590-1230 (Online)
Publisher: Elsevier
Country of publisher: Netherlands
LCC subjects: Technology
Website: https://www.journals.elsevier.com/results-in-engineering

About the journal

Abstract

Keywords