大连理工大学主页平台管理系统叶昕辰--中文主页--StereoDistill(TIP21)

大连理工大学| 登录| English | 手机版

同专业硕导

个人学术主页

叶昕辰 ( 副教授 ) 赞
的个人主页 http://faculty.dlut.edu.cn/yexinchen/zh_CN/index.htm
副教授硕士生导师
主要任职：IEEE member, ACM member
其他任职：IEEE协会会员， ACM协会会员, CCF计算机协会会员

StereoDistill(TIP21) 当前位置: 中文主页 >> 论文及项目 >> StereoDistill(TIP21)

Unsupervised Monocular Depth Estimation via Recursive Stereo Distillation

Xinchen Ye, Xin Fan^*, Mingliang Zhang, Wei Zhong, Rui Xu

Dalian University of Technology

* Corresponding author

Code: https://github.com/goldenwoman/Recursive_Stereo_Disill

Paper: bare_jrnl.pdf

Abstract

Existing unsupervised monocular depth estimation methods resort to stereo image pairs instead of ground-truth depth maps as supervision to predict scene depth. Constrained by the type of monocular input in testing phase, they fail to fully exploit the stereo information through the network during training, leading to the unsatisfactory performance of depth estimation. Therefore, we propose a novel architecture which consists of a monocular network (Mono-Net) that infers depth maps from monocular inputs, and a stereo network (Stereo-Net) that further excavates the stereo information by taking stereo pairs as input. During training, the sophisticated Stereo-Net guides the learning of Mono-Net and devotes to enhance the performance of Mono-Net without changing its network structure and increasing its computational burden. Thus, monocular depth estimation with superior performance and fast runtime can be achieved in testing phase by only using the lightweight Mono-Net. For the proposed framework, our core idea lies in: 1) how to design the Stereo-Net so that it can accurately estimate depth maps by fully exploiting the stereo information; 2) how to use the sophisticated Stereo-Net to improve the performance of Mono-Net. To this end, we propose a recursive estimation and refinement strategy for Stereo-Net to boost its performance of depth estimation. Meanwhile, a multi-space knowledge distillation scheme is designed to help Mono-Net amalgamate the knowledge and master the expertise from Stereo-Net in a multi-scale fashion. Experiments demonstrate that our method achieves the superior performance of monocular depth estimation in comparison with other state-of-the-art methods.

Method

Figure 1. Network overview. It includes a Mono-Net M and a Stereo-Net S, where M is a lightweight network that takes a single image as input, while S takes stereo images pair as input. S contains a recursive estimation strategy and a feature-driven adaptive refinement module to further improve the accuracy of depth estimation. The multi-space knowledge distillation scheme is designed to distill knowledge from S and squeeze into M.

Figure 2. Structures of Mono-Net M, Stereo-Net S, and the multi-space knowledge distillation scheme. We propose to cascade the feature-driven adaptive refinement module with S and update network weights in a recursive manner. The multi-space knowledge distillation scheme is designed to transfer knowledge from S to M in the aspects of output space, feature space and long-range dependencies based on multi-scale feature extraction.

Results

Figure 3. Qualitative comparison with different methods on KITTI dataset. (a) Color image, (b) Ground-truth, (c) Xu et al., (d) Godard et al., (e) Zhan et al., (f) Pilzer et al., (g) Wong et al., (h) Ours.

Citation

Xinchen Ye, Xin Fan*, Mingliang Zhang, Wei Zhong, Rui Xu, Unsupervised Monocular Depth Estimation via

Recursive Stereo Distillation, IEEE Trans. Image Processing, accepted, 2021.

@article{Ye2021tip,
author = {Xinchen Ye, Xin Fan, Mingliang Zhang, Wei Zhong, Rui Xu},
title = {Unsupervised Monocular Depth Estimation via Recursive Stereo Distillation},

booktitle = {IEEE Trans. Image Processing (TIP)},
year={2021}, volume={0}, pages={0-0},

}