IEEE Access (Jan 2022)

<italic>MBMT-Net</italic>: A Multi-Task Learning Based Convolutional Neural Network Architecture for Dense Prediction Tasks

  • George Ciubotariu,
  • Gabriela Czibula

DOI
https://doi.org/10.1109/ACCESS.2022.3225746
Journal volume & issue
Vol. 10
pp. 125600 – 125615

Abstract

Read online

Recently proposed improvements in the field of Computer Vision refer to enhancing the feature processing capabilities of Single-Task Convolutional Neural Networks. A typical Single-Task network consists of a backbone and a head, where the feature extractor is usually optimised using the gradient provided by the head. Inevitably, the backbone specialises for the given task. This sort of approach does not scale well for learning multiple tasks at once while having the same input. As a response, there is an increasing interest in Multi-Task formulations. Since most Multi-Task architectures employ a single shared backbone, when gradients from different tasks are propagated back to it, it can result in its oversaturation. Thus, this problem may be solved using Multi-Backbone feature extractors. Hence, as a strategy proposed to compensate for these shortcomings, we introduce MBMT-Net, a Multi-Backbone-Multi-Task-Network architecture based on a development strategy that infuses backbones with more diverse and specialised processing capabilities. MBMT-Net consists of parallel pre-trained backbones whose outputs are concatenated and offered to the Multi-Task heads that shall benefit from richer and more diverse features with decreased number of network parameters when compared to traditional Multi-Task architectures. Our strategy is architecture independent, and it can be applied to different types of backbones and parsing heads, which greatly extends the domain of configurable features, finally enhancing existing Single- and Multi-Task model building strategies and outperforming them when using the Multi-Backbone design. As a result, while having a deficit of 12.16M parameters, MBMT-Net reaches state-of-the-art performances, and surpasses the previously best semantic segmentation Multi-Task model in terms of Mean Intersection over Union when evaluated on NYUv2 data set.

Keywords