A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning

$Q$-learning算法在现实应用中具有数据效率的优势,但由于训练时主要依赖于视觉观察,因此它们很容易过拟合和训练不稳定。先前的研究,如SVEA,发现仅在数据增强上进行有选择性的应用可以提高没有破坏训练的RL模型的视觉泛化。我们重新审视了其数据增强的食谱,并发现其效果局限于增加 photometric 类型的增强。为解决这些局限性,我们提出了一个通用的数据增强方法,SADA,它适用于更广泛的增强类型。我们在DMC-GB2——我们为流行的DMControl泛化基准扩展——以及元世界和干扰控制套件的任务上对其效果进行基准测试,并发现我们的方法SADA极大地提高了训练稳定性和RL模型的泛化能力。可视化、代码和基准:请见此链接

$Q$-learning algorithms are appealing for real-world applications due to their data-efficiency, but they are very prone to overfitting and training instabilities when trained from visual observations. Prior work, namely SVEA, finds that selective application of data augmentation can improve the visual generalization of RL agents without destabilizing training. We revisit its recipe for data augmentation, and find an assumption that limits its effectiveness to augmentations of a photometric nature. Addressing these limitations, we propose a generalized recipe, SADA, that works with wider varieties of augmentations. We benchmark its effectiveness on DMC-GB2 — our proposed extension of the popular DMControl Generalization Benchmark — as well as tasks from Meta-World and the Distracting Control Suite, and find that our method, SADA, greatly improves training stability and generalization of RL agents across a diverse set of augmentations. Visualizations, code, and benchmark: see this https URL

https://arxiv.org/abs/2405.17416

https://arxiv.org/pdf/2405.17416.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注