的个人主页 http://faculty.dlut.edu.cn/yexinchen/zh_CN/index.htm
Unsupervised Monocular Depth Estimation via Recursive Stereo Distillation
Xinchen Ye, Xin Fan*, Mingliang Zhang, Wei Zhong, Rui Xu
Dalian University of Technology
* Corresponding author
Code: https://github.com/goldenwoman/Recursive_Stereo_Disill
Paper: bare_jrnl.pdf
Abstract
Existing unsupervised monocular depth estimation methods resort to stereo image pairs instead of ground-truth depth maps as supervision to predict scene depth. Constrained by the type of monocular input in testing phase, they fail to fully exploit the stereo information through the network during training, leading to the unsatisfactory performance of depth estimation. Therefore, we propose a novel architecture which consists of a monocular network (Mono-Net) that infers depth maps from monocular inputs, and a stereo network (Stereo-Net) that further excavates the stereo information by taking stereo pairs as input. During training, the sophisticated Stereo-Net guides the learning of Mono-Net and devotes to enhance the performance of Mono-Net without changing its network structure and increasing its computational burden. Thus, monocular depth estimation with superior performance and fast runtime can be achieved in testing phase by only using the lightweight Mono-Net. For the proposed framework, our core idea lies in: 1) how to design the Stereo-Net so that it can accurately estimate depth maps by fully exploiting the stereo information; 2) how to use the sophisticated Stereo-Net to improve the performance of Mono-Net. To this end, we propose a recursive estimation and refinement strategy for Stereo-Net to boost its performance of depth estimation. Meanwhile, a multi-space knowledge distillation scheme is designed to help Mono-Net amalgamate the knowledge and master the expertise from Stereo-Net in a multi-scale fashion. Experiments demonstrate that our method achieves the superior performance of monocular depth estimation in comparison with other state-of-the-art methods.
Method
Figure 1. Network overview. It includes a Mono-Net M and a Stereo-Net S, where M is a lightweight network that takes a single image as input, while S takes stereo images pair as input. S contains a recursive estimation strategy and a feature-driven adaptive refinement module to further improve the accuracy of depth estimation. The multi-space knowledge distillation scheme is designed to distill knowledge from S and squeeze into M.
Figure 2. Structures of Mono-Net M, Stereo-Net S, and the multi-space knowledge distillation scheme. We propose to cascade the feature-driven adaptive refinement module with S and update network weights in a recursive manner. The multi-space knowledge distillation scheme is designed to transfer knowledge from S to M in the aspects of output space, feature space and long-range dependencies based on multi-scale feature extraction.
Results
Figure 3. Qualitative comparison with different methods on KITTI dataset. (a) Color image, (b) Ground-truth, (c) Xu et al., (d) Godard et al., (e) Zhan et al., (f) Pilzer et al., (g) Wong et al., (h) Ours.
Citation
Xinchen Ye, Xin Fan*, Mingliang Zhang, Wei Zhong, Rui Xu, Unsupervised Monocular Depth Estimation via
Recursive Stereo Distillation, IEEE Trans. Image Processing, accepted, 2021.
@article{Ye2021tip,
author = {Xinchen Ye, Xin Fan, Mingliang Zhang, Wei Zhong, Rui Xu},
title = {Unsupervised Monocular Depth Estimation via Recursive Stereo Distillation},
booktitle = {IEEE Trans. Image Processing (TIP)},
year={2021}, volume={0}, pages={0-0},
}