MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds

我们提出了4D运动结构(MoSca),一种设计用于从随意捕捉的野外视频中的单目视频构建和合成动态场景的新颖视图的运动结构。为了解决这种具有挑战性和不收敛反问题,我们利用先验知识来自基础视觉模型,将视频数据提升为新运动结构(MoSca)表示,该表示能够简洁地且平滑地编码底层运动/变形。场景几何和外观随后与变形场分离,并通过全局融合Gauss分布锚定在MoSca上进行优化。此外,在动态渲染过程中,可以平滑初始化和优化相机姿态,无需使用其他姿态估计工具。实验证明,在动态渲染基准测试中具有最先进的性能。

We introduce 4D Motion Scaffolds (MoSca), a neural information processing system designed to reconstruct and synthesize novel views of dynamic scenes from monocular videos captured casually in the wild. To address such a challenging and ill-posed inverse problem, we leverage prior knowledge from foundational vision models, lift the video data to a novel Motion Scaffold (MoSca) representation, which compactly and smoothly encodes the underlying motions / deformations. The scene geometry and appearance are then disentangled from the deformation field, and are encoded by globally fusing the Gaussians anchored onto the MoSca and optimized via Gaussian Splatting. Additionally, camera poses can be seamlessly initialized and refined during the dynamic rendering process, without the need for other pose estimation tools. Experiments demonstrate state-of-the-art performance on dynamic rendering benchmarks.

https://arxiv.org/abs/2405.17421

https://arxiv.org/pdf/2405.17421.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注