International Journal of Advanced Robotic Systems (Aug 2023)

Visual–tactile fusion object classification method based on adaptive feature weighting

  • Peng Zhang,
  • Lu Bai,
  • Dongri Shan,
  • Xiaofang Wang,
  • Shuang Li,
  • Wenkai Zou,
  • Zhenxue Chen

DOI
https://doi.org/10.1177/17298806231191947
Journal volume & issue
Vol. 20

Abstract

Read online

Visual–tactile fusion information plays a crucial role in robotic object classification. The fusion module in existing visual–tactile fusion models directly splices visual and tactile features at the feature layer; however, for different objects, the contributions of visual features and tactile features to classification are different. Moreover, direct concatenation may ignore features that are more beneficial for classification and will also increase computational costs and reduce model classification efficiency. To utilize object feature information more effectively and further improve the efficiency and accuracy of robotic object classification, we propose a visual–tactile fusion object classification method based on adaptive feature weighting in this article. First, a lightweight feature extraction module is used to extract the visual and tactile features of each object. Then, the two feature vectors are input into an adaptive weighted fusion module. Finally, the fused feature vector is input into the fully connected layer for classification, yielding the categories and physical attributes of the objects. In this article, extensive experiments are performed with the Penn Haptic Adjective Corpus 2 public dataset and the newly developed Visual-Haptic Adjective Corpus 52 dataset. The experimental results demonstrate that for the public dataset Penn Haptic Adjective Corpus 2, our method achieves a value of 0.9750 in terms of the area under the curve. Compared with the highest area under the curve obtained by the existing state-of-the-art methods, our method improves by 1.92%. Moreover, compared with the existing state-of-the-art methods, our method achieves the best results in training time and inference time; while for the novel Visual-Haptic Adjective Corpus 52 dataset, our method achieves values of 0.9827 and 0.9850 in terms of the area under the curve and accuracy metrics, respectively. Furthermore, the inference time reaches 1.559 s/sheet, demonstrating the effectiveness of the proposed method.