RB-Modulation: Training-Free Personalization of Diffusion Models using Stochastic Optimal Control

我们提出了基于参考的调制(RB-Modulation)作为无训练的个人化扩散模型的新插件和即用解决方案。现有的无训练方法在(a)参考图像中提取风格时缺乏额外风格或内容文本描述的困难,(b)从参考风格图像中出现不需要的内容泄漏,以及(c)风格和内容的有效组合方面存在困难。RB-Modulation 基于一种新颖的随机最优控制器,其中风格描述符通过终端成本编码所需的属性。由此产生的漂移不仅克服了上述困难,而且确保了参考风格的高保真度,并符合给定的文本提示。我们还引入了一种跨注意力和基于内容聚合的机制,允许 RB-Modulation 从参考图像中解耦内容和学习风格。通过理论证明和实证证据,我们的框架以无训练的方式精确地提取和控制内容和学习风格。此外,我们的方法允许内容和学习风格的流畅组合,这标志着从依赖外部适配器或 ControlNets 的依赖模式中进行了分离。

We propose Reference-Based Modulation (RB-Modulation), a new plug-and-play solution for training-free personalization of diffusion models. Existing training-free approaches exhibit difficulties in (a) style extraction from reference images in the absence of additional style or content text descriptions, (b) unwanted content leakage from reference style images, and (c) effective composition of style and content. RB-Modulation is built on a novel stochastic optimal controller where a style descriptor encodes the desired attributes through a terminal cost. The resulting drift not only overcomes the difficulties above, but also ensures high fidelity to the reference style and adheres to the given text prompt. We also introduce a cross-attention-based feature aggregation scheme that allows RB-Modulation to decouple content and style from the reference image. With theoretical justification and empirical evidence, our framework demonstrates precise extraction and control of content and style in a training-free manner. Further, our method allows a seamless composition of content and style, which marks a departure from the dependency on external adapters or ControlNets.

https://arxiv.org/abs/2405.17401

https://arxiv.org/pdf/2405.17401.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注