Introduction

Unsupervised domain adapatation for semantic segmentation is the task of aligning a network trained on source data to perform well on target data. Complex deep neural networks for this task require to be trained with a huge amount of labeled data, which is difficult and expensive to acquire. A recently proposed workaround is the usage of synthetic data, however the differences between real world and synthetic scenes limit the performance.

Unsupervised Domain Adaptation for Semantic Segmentation of Urban Scenes [1]

The semantic understanding of urban scenes is one of the key components for an autonomous driving system. Complex deep neural networks for this task require to be trained with a huge amount of labeled data, which is difficult and expensive to acquire. A recently proposed workaround is the usage of synthetic data, however the differences between real world and synthetic scenes limit the performances. We propose an unsupervised domain adaptation strategy to adapt a synthetic supervised training to real world data. The proposed learning strategy exploits three components: a standard supervised learning on synthetic data, an adversarial learning strategy able to exploit both labeled synthetic data and unlabeled real data and finally a self-teaching strategy working on unlabeled data only. The last component is guided by the segmentation confidence, estimated by the fully convolutional discriminator of the adversarial learning module, helping to further reduce the domain shift between synthetic and real data. Furthermore we weighted this loss on the basis of the class frequencies in order to enhance performances on less common classes. Experimental results prove the effectiveness of the proposed strategy in adapting a segmentation network trained on synthetic datasets, like GTA5 and SYNTHIA, to a real dataset as Cityscapes.

The full paper can be downloaded from here

The code for the training and the evaluation of the proposed method is available here.

Adversarial Learning and Self-Teaching Techniques for Domain Adaptation in Semantic Segmentation [2]

Deep learning techniques have been widely used in autonomous driving systems for the semantic understanding of urban scenes. However, they need a huge amount of labeled data for training, which is difficult and expensive to acquire. A recently proposed workaround is to train deep networks using synthetic data, but the domain shift between real world and synthetic representations limits the performance. In this work, a novel Unsupervised Domain Adaptation (UDA) strategy is introduced to solve this issue. The proposed learning strategy is driven by three components: a standard supervised learning loss on labeled synthetic data; an adversarial learning module that exploits both labeled synthetic data and unlabeled real data; finally, a self- teaching strategy applied to unlabeled data. The last component exploits a region growing framework guided by the segmentation confidence. Furthermore, we weighted this component on the basis of the class frequencies to enhance the performance on less common classes. Experimental results prove the effectiveness of the proposed strategy in adapting a segmentation network trained on synthetic datasets, like GTA5 and SYNTHIA, to real world datasets like Cityscapes and Mapillary.

The preprint paper can be downloaded from here

The code for the training and the evaluation of the proposed method is available at: here.

The method is illustrated in Figure 1.

The main quantitative and qualitative results are reported in Table 1 and Figure 2.

Fig. 1: Architecture of the proposed framework. The combination of 3 losses is employed: a standard cross-entropy loss on synthetic data $\mathcal{L}_{G,1}$, an adversarial loss $\mathcal{L}^{s,t}_{G,2}$ and a self-teaching loss for unlabeled real data $\mathcal{L}_{G,3}$.

Fig. 2: Semantic segmentation of some sample scenes extracted from the Cityscapes (a) and Mapillary (b) validation datasets. The first group of six rows is related to the Cityscapes dataset, the last six to the Mapillary dataset. For each group, the first three rows are related to the experiments in which the GTA5 dataset is used as source. The last three rows are related to the case in which the SYNTHIA dataset is used as source.

Unsupervised Domain Adaptation with Multiple Domain Discriminators and Adaptive Self-Training [3]

Unsupervised Domain Adaptation (UDA) aims at improving the generalization capability of a model trained on a source domain to perform well on a target domain for which no labeled data is available. In this paper, we consider the semantic segmentation of urban scenes and we propose an approach to adapt a deep neural network trained on synthetic data to real scenes addressing the domain shift between the two different data distributions. We introduce a novel UDA framework where a standard supervised loss on labeled synthetic data is supported by an adversarial module and a self-training strategy aiming at aligning the two domain distributions. The adversarial module is driven by a couple of fully convolutional discriminators dealing with different domains: the first discriminates between ground truth and generated maps, while the second between segmentation maps coming from synthetic or real world data. The self-training module exploits the confidence estimated by the discriminators on unlabeled data to select the regions used to reinforce the learning process. Furthermore, the confidence is thresholded with an adaptive mechanism based on the per-class overall confidence. Experimental results prove the effectiveness of the proposed strategy in adapting a segmentation network trained on synthetic datasets like GTA5 and SYNTHIA, to real world datasets like Cityscapes and Mapillary.

The preprint paper can be downloaded from here

The code for the training and the evaluation of the proposed method is available at: here.

The method is illustrated in Figure 3.

The main quantitative and qualitative results are reported in Table 1 and Figure 4.

Fig. 3: Architecture of the proposed approach. The semantic segmentation network $G$ is trained with the combination of $4$ losses: a supervised cross entropy on source data $\mathcal{L}_{G,0}$, a double adversarial framework $\mathcal{L}^{s,t}_{G,1}$ and $\mathcal{L}_{G,2}^t$, and a self-training module $\mathcal{L}_{G,3}$ with class-wise and time-varying adaptive thresholding mask $\mathcal{T}_f$.

Table 1: Per-class and mean IoU on the four considered UDA scenarios. The approaches have been trained in a supervised way on the synthetic dataset and the unsupervised domain adaptation has been performed using the respective real world training set. The results are reported on the real world validation sets.

Fig. 4: Semantic segmentation of some sample scenes extracted from the Cityscapes (a) and Mapillary (b) validation datasets. The first group of four rows is related to the Cityscapes dataset, the last four to the Mapillary dataset. For each group, the first two rows are related to the experiments in which the GTA5 dataset is used as source. The last two rows are related to the case in which the SYNTHIA dataset is used as source.

Contacts

For any information on the method you can contact lttm@dei.unipd.it

Have a look at our website http://lttm.dei.unipd.it for other works on this topic.

References

[1] M. Biasetton, U. Michieli, G. Agresti, P. Zanuttigh, "Unsupervised Domain Adaptation for Semantic Segmentation of Urban Scenes", Proceedings of the Int. Conference of Computer Vision and Pattern Recognition Workshop (CVPRW): Workshop on Autonomous Driving (WAD), Long Beach, USA, 2019.

[2] U. Michieli, M. Biasetton, G. Agresti, P. Zanuttigh, "Adversarial Learning and Self-Teaching Techniques for Domain Adaptation in Semantic Segmentation", IEEE Transactions on Intelligent Vehicles (T-IV), 2020.

[3] T. Spadotto, M. Toldo, U. Michieli, P. Zanuttigh, "Unsupervised Domain Adaptation with Multiple Domain Discriminators and Adaptive Self-Training", International Conference on Pattern Recognition (ICPR), 2020.

xhtml/css website layout by Ben Goldman - http://realalibi.com

Unsupervised Domain Adaptation for Semantic Segmentation of Urban Scenes

Menu