IEEE Access (Jan 2024)

Perceptual Visual Feature Learning With Applications in Sports Educational Image Understanding

  • Tengsheng Liu,
  • Minghui Xu

DOI
https://doi.org/10.1109/ACCESS.2024.3377657
Journal volume & issue
Vol. 12
pp. 41168 – 41179

Abstract

Read online

Effectively understanding the semantics of sophisticated sceneries is a key module in plenty of artificial intelligence (AI) systems. In this article, we optimally fuse multi-channel perceptual visual features for recognizing scenic pictures with complex spatial configurations, focusing on formulating a deep hierarchical model to actively discover human gaze allocation. In detail, to uncover semantically/visually important patches within each scenery, we utilize the BING objectness descriptor to rapidly and accurately localize multi-scale objects or their components. Subsequently, a local-global feature fusion scenario is proposed to dynamically combine the multiple low-level features from multiple scenic patches. To simulate how humans perceiving semantically/visually important scenic patches, we design a robust deep active learning (RDAL) paradigm that sequentially derives gaze shift path (GSP) and hierarchically learns deep GSP features in a unified architecture. Notably, the key advantage of RDAL is the high tolerance of label noise by adding an elaborately-designed sparse penalty. That is, the contaminated and redundant deep GSP features can be implicitly abandoned. Finally, the refined deep GSP features are integrated into a multi-label SVM for recognizing sceneries of different categories. Empirical comparisons showed that: 1) our method performs competitively on six generic scenery set (average accuracy 2%~4.3% higher than the second best performer), and 2) our deep GSP feature is particularly discriminative to our compiled sport educational image set (average accuracy 7.7% higher than the second best performer).

Keywords