Journal of Applied Science and Engineering (Apr 2023)

Multi-modal Human-Computer Virtual Fusion Interaction In Mixed Reality

  • Shengying Jia

DOI
https://doi.org/10.6180/jase.202311_26(11).0010
Journal volume & issue
Vol. 26, no. 11
pp. 1609 – 1618

Abstract

Read online

Since the receptive field of CNN usually reflects the size of its learning ability, it is limited by the size of the convolution kernel. At the same time, the use of pooling to increase the receptive field will cause the lack of spatial information of the feature map. While large receptive fields do not cause information loss, a deep multimodal fusion line-of-sight tracking model based on dilated convolution is proposed. Using dilated convolution to further improve Res Net-50, and through experiments, it is proved that the use of dilated convolution can further improve the performance of the model. Comparing the designed gaze tracking model with the CNN-based gaze tracking model shows the results of the superiority of the gaze tracking model. In order to minimize the number of manual interventions, this paper adopts an adaptive target tracking method to achieve automatic collection of training samples. Based on the idea of active learning, the learning algorithm selects the sample containing the most information from the input stream of the training sample (the matching confidence given by the nearest neighbor classifier is lower than the set threshold) for constructing the perceptual model. A feature that is invariant to changes in rotation, brightness, and contrast is selected as the target descriptor to enhance the discriminative ability of the perceptual model. The experimental results verify the effectiveness of the multi-modal interactive visual perception method.

Keywords