Multi-Modal Chatbot in Intelligent Manufacturing

Tzu-Yu Chen; Yu-Ching Chiu; Nanyi Bi; Richard Tzong-Han Tsai

doi:10.1109/ACCESS.2021.3083518

IEEE Access (Jan 2021)

Multi-Modal Chatbot in Intelligent Manufacturing

Tzu-Yu Chen,
Yu-Ching Chiu,
Nanyi Bi,
Richard Tzong-Han Tsai

Affiliations

Tzu-Yu Chen: Department of Computer Science and Information Engineering, National Central University, Taoyuan City, Taiwan
Yu-Ching Chiu: ORCiD; Department of Computer Science and Information Engineering, National Central University, Taoyuan City, Taiwan
Nanyi Bi: ORCiD; IoX Center, National Taiwan University, Taipei, Taiwan
Richard Tzong-Han Tsai: ORCiD; Department of Computer Science and Information Engineering, National Central University, Taoyuan City, Taiwan

DOI: https://doi.org/10.1109/ACCESS.2021.3083518
Journal volume & issue: Vol. 9
pp. 82118 – 82129

Abstract

Read online

Artificial intelligence (AI) has been widely used in various industries. In this work, we concentrate on what AI is capable of doing in manufacturing, in the form of a chatbot. We designed a chatbot that helps users complete an assembly task that simulates those in manufacturing settings. In order to recreate this setting, we have users assemble a Meccanoid robot through multiple stages, with the help of an interactive dialogue system. Based on classifying users’ intent, the chatbot is able to provide answers or instructions to the user when the user encounters problems during the assembly process. Our goal is to improve our system so that it can capture users’ needs by detecting their intent and therefore provide relevant and helpful information to the user. However, in a multiple-step task, we cannot rely on intent classification with user question utterance as the only input, as user questions raised from different steps may share the same intent but require different responses. In this paper, we proposed two methods to address this problem. One is that we capture not only textual features but also visual features through the YOLO-based Masker with CNN (YMC) model. Another is the usage of an Autoencoder to encode multi-modal features for user intent classification. By incorporating visual information, we have significantly improved the chatbot’s performance from the experiments conducted on different dataset.

Published in IEEE Access

ISSN: 2169-3536 (Online)
Publisher: IEEE
Country of publisher: United States
LCC subjects: Technology: Electrical engineering. Electronics. Nuclear engineering
Website: https://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639

About the journal

Abstract

Keywords