Alexandria Engineering Journal (Apr 2025)
Intelligent visual question answering in TCM education: An innovative application of IoT and multimodal fusion
Abstract
This paper proposes an innovative Traditional Chinese Medicine Ancient Text Education Intelligent Visual Question Answering System (TCM-VQA IoTNet), which integrates Internet of Things (IoT) technology with multimodal learning to achieve a deep understanding and intelligent question answering of both the images and textual content of traditional Chinese medicine ancient texts. The system utilizes the VisualBERT model for multimodal feature extraction, combined with Gated Recurrent Units (GRU) to process time-series data from IoT sensors, and employs an attention mechanism to optimize feature fusion, dynamically adjusting the question answering strategy. Experimental evaluations on standard datasets such as VQA v2.0, CMRC 2018, and the Chinese Traditional Medicine Dataset demonstrate that TCM-VQA IoTNet achieves accuracy rates of 72.7%, 69.%, and 75.4% respectively, with F1-scores of 70.3%, 67.5%, and 73.9%, significantly outperforming existing mainstream models. Furthermore, TCM-VQA IoTNet has shown excellent performance in practical applications of traditional Chinese medicine education, significantly enhancing the precision and interactivity of intelligent education. Future research will focus on improving the model’s generalization ability and computational efficiency, further expanding its application potential in traditional Chinese medicine diagnosis and education.