Collaborative Video Diffusion: Consistent Multi-video Generation with Camera Control

近年来,在视频生成方面的研究取得了巨大的进展,使得从文本提示或图像中生成高质量视频成为可能。向视频生成过程添加控制是一个重要的目标,并且最近基于相机轨迹条件的视频生成方法正在朝着这个目标迈进。然而,从多个不同的相机轨迹生成相同场景的视频仍然具有挑战性。解决这个多视频生成问题可以使大型规模的3D场景生成成为可能,并应用于其他领域。我们介绍了一种合作视频扩散(CVD)方法作为实现这一愿景的重要一步。CVD框架包括一个新颖的跨视频同步模块,使用极化注意机制促进来自不同相机姿势渲染的相同视频的对应帧之间的 consistency。在训练过程中,CVD 使用最先进的视频生成相机控制模块,生成了具有比基线更好一致性的多个视频,实验结果表明,CVD 生成的视频比基线更具吸引力。项目页面:此链接:https://this.link

Research on video generation has recently made tremendous progress, enabling high-quality videos to be generated from text prompts or images. Adding control to the video generation process is an important goal moving forward and recent approaches that condition video generation models on camera trajectories make strides towards it. Yet, it remains challenging to generate a video of the same scene from multiple different camera trajectories. Solutions to this multi-video generation problem could enable large-scale 3D scene generation with editable camera trajectories, among other applications. We introduce collaborative video diffusion (CVD) as an important step towards this vision. The CVD framework includes a novel cross-video synchronization module that promotes consistency between corresponding frames of the same video rendered from different camera poses using an epipolar attention mechanism. Trained on top of a state-of-the-art camera-control module for video generation, CVD generates multiple videos rendered from different camera trajectories with significantly better consistency than baselines, as shown in extensive experiments. Project page: this https URL.

https://arxiv.org/abs/2405.17414

https://arxiv.org/pdf/2405.17414.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注