Journal of King Saud University: Computer and Information Sciences (Sep 2023)
BiTNet: A lightweight object detection network for real-time classroom behavior recognition with transformer and bi-directional pyramid network
Abstract
In this paper, we propose BiTNet, a real-time object detection network, for enhancing teaching quality and providing feedback through real-time analysis of student behavior in classroom. BiTNet addresses challenges faced by current methods, such as occlusion and small objects in images. Specifically, it incorporates an Efficient Transformer Block (ETB) and Efficient Convolution Aggregation Block (ECAB) for extracting image features. ETB uses convolutional multi-head self-attention (CMHSA) to capture contextual information thus improves the capability to recoginze occluded objects. Its homogeneous multi-branch design reduces the computational costs. The single aggregation operation of ECAB aggregates the extracted features into a single feature map, improving the accuracy of recognition while reducing the computational cost. Then, the semantic and positional information extracted from the backbone are fused in Bi-directional Feature Pyramid Network (BiFPN). BiTNet also employs a smaller size detection head for recognition of small objects. Experimental results on classroom behavior dataset demonstrate that our proposed method has impressive accuracy (82.9%) and speed (256.7 FPS on GPU, 30.7 FPS on CPU, and 18.9 on camera). We further experiment with our method on Pascal VOC 2007 and VisDrone2021, validating BiTNet’s generalization and its capability to recognize small objects. In conclusion, BiTNet has a simple yet efficient network structure and feature extraction module, which reduce computational cost while maintaining recognition accuracy, making it a contender for real-time classroom behavior recognition tasks.