Multimedia Technology and Telecommunications Lab

-- -

Unsupervised Domain Adapatation (UDA) for semantic segmentation is the task of aligning a network trained on source data to perform well on target data. Complex deep neural networks for this task require to be trained with a huge amount of labeled data, which is difficult and expensive to acquire. A recently proposed workaround is to use synthetic data, however the differences between real world and synthetic scenes limit the performance. UDA techniques allow to reduce this gap allowing to obtain reliable performances on the target domain.

In [1] a novel unsupervised domain adaptation strategy is proposed to adapt a synthetic supervised training to real world data. The proposed learning strategy exploits three components: a standard supervised learning on synthetic data, an adversarial learning strategy able to exploit both labeled synthetic data and unlabeled real data and finally a self-teaching strategy working on unlabeled data only. The last component is guided by the segmentation confidence, estimated by the fully convolutional discriminator of the adversarial learning module, helping to further reduce the domain shift between synthetic and real data. Furthermore we weighted this loss on the basis of the class frequencies to enhance the performance on less common classes.

The approach of [2] moves from [1]. However, the self-teaching component has been greatly improved in this work. First of all, the output of the discriminator has been considered as a weight to be applied to the loss function of the self-teaching component at each location (in place of the hard threshold used in previous work [1]). Then, a novel region growing scheme is introduced in order to extend and better represent the shape of reliable regions (previous approaches tend to almost always discard edge regions and small objects). Finally, since the various classes have different frequencies, we also weighted the loss coming from unlabeled data in proportion to the frequency of the various classes in the dataset thus obtaining a better balance of the results between the different classes and avoiding the dramatic drop in performance on less common classes (typically corresponding to small objects and structures that represent the critical elements for an autonomous vehicle).

Starting from the architecture of [1] and [2], in [3] we further develop a novel UDA framework where a standard supervised loss on labeled synthetic data is supported by an adversarial module and a self-training strategy aiming at aligning the two domain distributions. An improved adversarial module is driven by a couple of fully convolutional discriminators dealing with different domains: the first discriminates between ground truth and generated maps, while the second between segmentation maps coming from synthetic or real world data. The self-training module exploits the confidence estimated by the discriminators on unlabeled data to select the regions used to reinforce the learning process. Furthermore, the confidence is thresholded with an adaptive mechanism based on the per-class overall confidence.

In [4] we propose a novel UDA strategy to address the domain shift issue between real world and synthetic representations. An adversarial model, based on the cycle consistency framework, performs the mapping between the synthetic and real domain. The data is then fed to a MobileNet-v2 architecture that performs the semantic segmentation task. An additional couple of discriminators, working at the feature level of the MobileNet-v2, allows to better align the features of the two domain distributions and to further improve the performance. Finally, the consistency of the semantic maps is exploited. After an initial supervised training on synthetic data, the whole UDA architecture is trained end-to-end considering all its components at once. Experimental results show how the proposed strategy is able to obtain impressive performance in adapting a segmentation network trained on synthetic data to real world scenarios. The usage of the lightweight MobileNet-v2 architecture allows its deployment on devices with limited computational resources as the ones employed in autonomous vehicles.

The aim of [5] is to give an overview of the recent advancements in the Unsupervised Domain Adaptation (UDA) of deep networks for semantic segmentation. Motivated by the recent growth in interest towards this field, we build a comprehensive overview of the proposed methodologies and provide a clear categorization. We start by introducing the problem, its formulation and the various scenarios that can be considered. Then, we introduce the different levels at which adaptation strategies may be applied: namely, at the input (image) level, at the internal features representation and at the output level. Furthermore, we present a detailed overview of the literature in the field, dividing previous methods based on the following (non mutually exclusive) categories: adversarial learning, generative-based, analysis of the classifier discrepancies, self-teaching, entropy minimization, curriculum learning and multi-task learning. Novel research directions are also briefly introduced to give a hint of interesting open problems in the field. Finally, a comparison of the performance of the various methods in the widely used autonomous driving scenario is presented.

In [6] we propose a novel Unsupervised Domain Adaptation (UDA) strategy, based on a feature clustering method that captures the different semantic modes of the feature distribution and groups features of the same class into tight and well-separated clusters. Furthermore, we introduce two novel learning objectives to enhance the discriminative clustering performance: an orthogonality loss forces spaced out individual representations to be orthogonal, while a sparsity loss reduces class-wise the number of active feature channels. The joint effect of these modules is to regularize the structure of the feature space.