Journal of King Saud University: Science (Mar 2024)

Decomposed channel based Multi-Stream Ensemble: Improving consistency targets in semi-supervised 2D pose estimation

  • Jiaqi Wu,
  • Junbiao Pang,
  • Qingming Huang

Journal volume & issue
Vol. 36, no. 3
p. 103078

Abstract

Read online

Objectives: In pose estimation, semi-supervised learning is a crucial approach to overcome the lack of information problem of labeled data. However, for semi-supervised learning, the insufficient number of labeled samples also severely affects its functionality. The fewer labeled the data, the less stable the prediction. Deep ensemble is a good way to improve model accuracy and stability. However, the training time of model ensemble is long and the resource consumption is high, so it cannot be applied in many practical scenarios. Therefore, the methods we propose the Decomposed Channel based Multi-Stream Ensemble (DCMSE) network, which can extend a single model to a stream-ensemble structure and generate the ensemble prediction to solve the large variance of prediction from the lack of labeled data, and improve the performance. The Channel Deconstruction and Ensembling (CDE) module makes the network benefits from both diversity and commonality by implementing ensemble without increasing the size of parameters. The output features are split into two parts, common-channels and private-channels. In feature sampling, on the one hand, common channels can provide commonality between streams. On the other hand, private channels can provide diversity for each stream and avoid homogenization of the predictions for each stream. Both diversity and commonality allow the network to not only gain in the ensemble of streams, but also improve the prediction accuracy of each stream itself. Results: Moreover, we propose mean-stream consistency constraints and cross-stack consistency constraints to obtain gains from unlabeled data. The Mean-Stream (MS) consistency constraint uses multi-stream ensemble prediction to additionally supervise each stream. Based on the characteristics of the Stacked Hourglass model, the Cross-Stage consistency constraint (CS) uses the forecasting results of later stages to supervise the forecasting of previous stages from the perspective of stages. Conclusion: Our approach achieves better results than SOTAs on the FLIC and Openfield-Pranav and our Sniffing data-set. Specifically, on the MSE, our method achieves at least 0.88, 0.13, and 0.08 improvements over the SOTA method on the FLIC, Openfield-Pranav, and our Sniffing datasets, respectively.

Keywords