Rethinking Transformers in Solving POMDPs

本文研究了在现实场景中具有部分可观测性的环境中,序列决策算法(如强化学习)的有效性。我们详细研究了Transformer在部分可观测的马尔可夫决策过程(POMDPs)中的效果,并揭示了其理论局限性。我们发现,像Transformer这样在可观测性上挣扎的模型,将平凡语言(即Transformer无法建模的语言)归结为POMDP。这给Transformer在学习和理解POMDP特定归纳偏置带来了重大挑战,因为它们在其他模型(如RNNs)中缺乏固有的递归。本文对Transformer作为序列模型在RL领域的普遍信仰表示怀疑,并提出了一个点状递归结构。深度线性循环单元(LRU)成为了一个在部分可观测的RL中具有良好替代性的合适选择,实验结果强调了Transformer的低性能和LRU的显著优势。

Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers struggle to model, are reducible to POMDPs. This poses a significant challenge for Transformers in learning POMDP-specific inductive biases, due to their lack of inherent recurrence found in other models like RNNs. This paper casts doubt on the prevalent belief in Transformers as sequence models for RL and proposes to introduce a point-wise recurrent structure. The Deep Linear Recurrent Unit (LRU) emerges as a well-suited alternative for Partially Observable RL, with empirical results highlighting the sub-optimal performance of the Transformer and considerable strength of LRU.

https://arxiv.org/abs/2405.17358

https://arxiv.org/pdf/2405.17358.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注