Jisuanji kexue (Nov 2022)

Variational Domain Adaptation Driven Semantic Segmentation of Urban Scenes

  • JIN Yu-jie, CHU Xu, WANG Ya-sha, ZHAO Jun-feng

DOI
https://doi.org/10.11896/jsjkx.220500193
Journal volume & issue
Vol. 49, no. 11
pp. 126 – 133

Abstract

Read online

Semantic segmentation of urban scenes aims to identify and segment persons,obstacles,roads,signs and other elements from the image,and provide information of free space on the road for vehicles.It is one of the key technologies of automatic dri-ving.High performance semantic segmentation systems rely heavily on a large number of real annotation data required for trai-ning.However,labeling each pixel in the image is costly and often difficult to achieve.One way is to collect photo-realistic synthe-tic data from video games,where pixel-level annotation can be automatically generated at a low cost,to train the machine learning model to segment the images in the real world,which corresponds to domain adaptation.Different from the current mainstream semantic segmentation domain adaptation methods based on Vapnik-Chervonenkis dimension theory or Rademacher complexity theory,our method is inspired by the target domain Gibbs risk upper bound compatible with pseudo labels based on PAC-Bayes theory,and considers the average situation of the hypothetical space rather than the worst situation,so as to avoid excessively constraining the domain discrepancy in the latent space which leads to the problem that the upper bound of target domain genera-lization error cannot be estimated and optimized effectively.Under the guidance of the above ideas,this paper proposes a varia-tional inference method for semantic segmentation adaptation(VISA).The dropout variational family is used for variational infe-rence.While solving the ideal posterior distribution in the hypothesis space,an approximate Bayes classifier can be quickly obtained,and the estimation of the upper bound of risk is more accurate by minimizing the entropy of the target domain and filtering pixels.Experiments show that the mean intersection over the union(mIoU) of VISA is 0.5% ~ 6.6% higher than that of baseline methods,and has high accuracy in pedestrian,vehicle and other urban scene elements.

Keywords