DCPI-Depth: Explicitly Infusing Dense Correspondence Prior to Unsupervised Monocular Depth Estimation

近年来,在无需监督的情况下,从单目视频中学到深度视觉的兴趣有所增加。这个领域的一个关键挑战是在具有弱纹理或存在动态物体的挑战性场景中实现稳健和准确的深度估计。本研究通过深入研究密集匹配先验,为现有框架提供了明确的几何约束,从而做出了三个主要贡献。第一项创新是一个基于估计自 ego 运动对密集匹配进行三角化的上下文几何深度一致性损失,它通过从估计自 ego 运动的深度图来指导深度感知的学习,因为明确三角化的深度图可以捕捉像素之间的准确相对距离。第二项创新来自于观察到光流扩散与深度梯度之间存在显式、可推导的关系。因此,设计了一种差异性质相关损失来通过强调局部变化来优化深度估计。第三项创新是一种双向流调整策略,它增强了刚性和光学流的交互,鼓励前者更准确地匹配,并使后者在各种场景下更具适应性。DCPI-Depth,一个包含所有创新组件并连接两个双向和协作流的大框架,在多个公共数据集上的状态级性能和泛化能力均达到了最先进水平,超过了所有现有先驱艺术。具体来说,它展示了在纹理无尽和动态区域中的准确深度估计,并展示了更加合理的平滑度。

There has been a recent surge of interest in learning to perceive depth from monocular videos in an unsupervised fashion. A key challenge in this field is achieving robust and accurate depth estimation in challenging scenarios, particularly in regions with weak textures or where dynamic objects are present. This study makes three major contributions by delving deeply into dense correspondence priors to provide existing frameworks with explicit geometric constraints. The first novelty is a contextual-geometric depth consistency loss, which employs depth maps triangulated from dense correspondences based on estimated ego-motion to guide the learning of depth perception from contextual information, since explicitly triangulated depth maps capture accurate relative distances among pixels. The second novelty arises from the observation that there exists an explicit, deducible relationship between optical flow divergence and depth gradient. A differential property correlation loss is, therefore, designed to refine depth estimation with a specific emphasis on local variations. The third novelty is a bidirectional stream co-adjustment strategy that enhances the interaction between rigid and optical flows, encouraging the former towards more accurate correspondence and making the latter more adaptable across various scenarios under the static scene hypotheses. DCPI-Depth, a framework that incorporates all these innovative components and couples two bidirectional and collaborative streams, achieves state-of-the-art performance and generalizability across multiple public datasets, outperforming all existing prior arts. Specifically, it demonstrates accurate depth estimation in texture-less and dynamic regions, and shows more reasonable smoothness.

https://arxiv.org/abs/2405.16960

https://arxiv.org/pdf/2405.16960.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注