Computers and Education: Artificial Intelligence (Jan 2023)
Automated coding of student chats, a trans-topic and language approach
Abstract
Computer-Supported Collaborative Learning (CSCL) is known to be productive if well structured. In CSCL, students construct knowledge by performing learning tasks while communicating about their work. This communication is most often done through online written chats. Understanding what is happening in chats is important from both research and practical perspectives. From a research perspective, insight into chat content offers a window into student interaction and learning. From a more practical standpoint, insight into chat content can (potentially) be used to trigger supportive elements in CSCL environments (e.g., context-sensitive tips or conversational agents). The latter requires real-time, and therefore automated, analysis of the chats. Such an automated analysis is also helpful from the research perspective, since hand-coding of chats is a very time and labour-consuming activity. In this article, we propose a new machine learning-based system for automated coding of student chats, which we labelled ConSent. The core of ConSent is an algorithm that uses contextual information and sentence encoding to produce a reliable estimation of chat message content (i.e. code). To optimize usability, ConSent was designed in such a way that it can cover various topics and various languages. To evaluate our approach, we used two sets of chats coming from different topics (within the domain of physics) and different languages (Dutch and Portuguese). We tested different algorithm configurations, including two multilingual sentence encoders, to find the model that yields the best reliability. As a result, analysis revealed that ConSent models can perform with substantial reliability levels and are able to transfer reliable coding of chats in a similar topic and different language. Finally, we discuss how ConSent can form the basis for a conversational agent, we explain the limitations of our approach, and we indicate possible paths for future work to contribute towards reliable and transferable models.