Multimedia Technology and Telecommunications Lab

ToF and stereo data fusion

Home Page


Research Areas








Continuous-wave Time-of-Flight (ToF) cameras attracted a large attention both from the research community and for commercial applications due to their ability to robustly measure the scene depth in real-time. They have been employed for many computer vision applications including human body tracking, 3D scene reconstruction, robotics, object detection and hand gesture recognition. The success of this kind of systems is given by their benefits, e.g., the simplicity of processing operations for the estimation of the depth maps, the absence of moving components, the possibility to generate a dense depth map, the absence of artifacts due to occlusions and scene texture. Other depth estimation systems as Structured Light (SL) and stereo vision systems have weaknesses due to these aspects and so it is preferable to use ToF cameras in many situations. Beside these good aspects, ToF cameras have also some limitations for which they need to be further analyzed and improved. Some of these limitations are a low spatial resolution due to the complexity of pixel hardware required for the depth estimation, the presence of a maximum measurable distance, estimation artifacts on the edges and corners and the wrong depth estimation due to the Multi-Path Interference (MPI) phenomenon. We propose in [1,2] and [3] 2 different approaches to correct this problem:

  • In [1], we use a Convolutional Neural Network (CNN), trained on synthetic data, to estimate the error due to the MPI corruption. We use data acquired with a multi-frequency ToF camera as input for the CNN, exploiting the frequency diversity of this phenomenon to estimate it.
    In [2], we show the synthetic training limitation when the CNN is tested on real data. Here, we propose a novel unsupervised domain adaptation method to improve the ToF denoising performance on real data without using real world depth ground truth.

  • In [3], we use a modified ToF camera that illuminates the scene with a spatialy modulated sine pattern. With this approach it is possible separate the direct and the global component of the light in case of diffuse reflections and so to correct MPI. The employed patterns can be used also to estimate a second depth map of the scene with a structured light approach. Finally the 2 depth maps are fused with a Maximum Likelihood approach guided by the estimated sensor noise statistics.

Related Papers:

[1]   G. Agresti and P. Zanuttigh, Deep learning for multi-path error removal in ToF sensors, Proceedings of the European Conference on Computer Vision Workshop (ECCVW): Geometry Meets Deep Learning, Munich, Germany, 2018.

[2]   G. Agresti, H. Schaefer, P. Sartor, P. Zanuttigh, Deep learning for multi-path error removal in ToF sensors, Proceedings of the Int. Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, USA, 2019.

[3]   G. Agresti and P. Zanuttigh, Combination of spatially-modulated ToF and structured light for MPI-free depth estimation, Proceedings of the European Conference on Computer Vision Workshop (ECCVW): 3D Reconstruction in the Wild, Munich, Germany, 2018.