SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

基于遥感图像的多视角立体研究促进了大规模城市三维重建的发展。然而,遥感多视角图像数据在获取过程中存在遮挡和视差不均的问题,导致深度估计中的模糊细节问题。为解决上述问题,我们重新审视了多视角立体任务中的变形学习方法,并提出了基于视空间和深度变形学习(SDL-MVS)的新范式,旨在学习不同视图空间特征的变形交互,并使用变形建模深度范围和间隔以实现高精度的深度估计。具体来说,为了解决由遮挡和不均视差引起的视差噪声问题,我们提出了渐进空间变形采样(PSS)机制,通过以渐进的方式在三维弗鲁姆空间和二维图像空间执行变形学习来嵌入源特征到参考特征适当地嵌入。为了进一步提高深度,我们引入了深度假设变形离散化(DHD),通过自适应地调整深度范围假设和深度间隔假设的变形离散化实现了精确的深度位置。最后,通过变形学习范式和视空间深度,我们的SDL-MVS实现了对多视角立体中遮挡和不均视差的显式建模,实现了准确的多视角深度估计。在LuoJia-MVS和WHU数据集上进行的大量实验证明,我们的SDL-MVS达到了最先进的性能水平。值得注意的是,在我们的SDL-MVS中,取得了MAE误差为0.086,对于小于0.6米的深度,准确率为98.9%,在小于3个视图的条件下,准确率同样达到了98.9%。

Research on multi-view stereo based on remote sensing images has promoted the development of large-scale urban 3D reconstruction. However, remote sensing multi-view image data suffers from the problems of occlusion and uneven brightness between views during acquisition, which leads to the problem of blurred details in depth estimation. To solve the above problem, we re-examine the deformable learning method in the Multi-View Stereo task and propose a novel paradigm based on view Space and Depth deformable Learning (SDL-MVS), aiming to learn deformable interactions of features in different view spaces and deformably model the depth ranges and intervals to enable high accurate depth estimation. Specifically, to solve the problem of view noise caused by occlusion and uneven brightness, we propose a Progressive Space deformable Sampling (PSS) mechanism, which performs deformable learning of sampling points in the 3D frustum space and the 2D image space in a progressive manner to embed source features to the reference feature adaptively. To further optimize the depth, we introduce Depth Hypothesis deformable Discretization (DHD), which achieves precise positioning of the depth prior by adaptively adjusting the depth range hypothesis and performing deformable discretization of the depth interval hypothesis. Finally, our SDL-MVS achieves explicit modeling of occlusion and uneven brightness faced in multi-view stereo through the deformable learning paradigm of view space and depth, achieving accurate multi-view depth estimation. Extensive experiments on LuoJia-MVS and WHU datasets show that our SDL-MVS reaches state-of-the-art performance. It is worth noting that our SDL-MVS achieves an MAE error of 0.086, an accuracy of 98.9% for <0.6m, and 98.9% for <3-interval on the LuoJia-MVS dataset under the premise of three views as input.

https://arxiv.org/abs/2405.17140

https://arxiv.org/pdf/2405.17140.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注