Data in Brief (Apr 2024)
Dataset of computer science course queries from students: Categorized and scored according to Bloom's taxonomy
Abstract
“Why don't students learn?” is a common question that educators try to address. To encourage students to become more engaged in the learning process, we believe in fostering their natural curiosity by encouraging them to ask high-level questions. To support this approach, we have compiled a dataset of questions that we hope will aid in the training of artificial intelligence (AI) models and ultimately improve the learning experience for students. To develop our dataset, we collected anonymous student questioning data in the Summer 2023 semester, utilizing our online application named “Palta Question”, resulting in a dataset of 8,811 unique questions. The dataset consists of students’ inquiries which underwent basic question validation using a sophisticated keyword-based approach, manual categorization by topic and course content, as well as complexity assessment using Bloom's taxonomy keywords which have also been included in the dataset. To ensure question uniqueness, we implemented the Levenshtein distance algorithm to exclude questions with a high similarity rate. This dataset provides targeted insights into student inquiry patterns and knowledge gaps within the domain of 'Introduction to Computers and Research' and 'Data Structure' courses, originating from the students at Independent University, Bangladesh (IUB). While its scope is confined to a specific student group and academic context, limiting broader applicability, it remains valuable for detailed studies in these subjects and serves as a useful foundation for AI-based educational research tools. To demonstrate the effectiveness of the dataset, we also tested it to train the AI to perform basic tasks like sorting questions according to their courses and topics. However, we envision researchers utilizing it to enhance education and aid in students' learning.