The Expressive Capacity of State Space Models: A Formal Language Perspective

近年来,基于线性状态空间模型(SSMs)的循环模型已经在语言建模(LM)任务中显示出与Transformer相媲美的优异性能。然而,对于这种模型的原理能力,目前尚缺乏深入的理解,这可能有助于寻找更好的LM架构。我们对这类SSMs与Transformer以及传统RNN的比较进行了全面 theoretical 研究。我们发现,SSMs和Transformer具有重叠但独特的优势。在空心状态跟踪中,SSMs实现了Transformer无法完全表示的问题的直观且精确的解决方案。它们还可以在不进行堆栈模拟的情况下建模有界分层结构。另一方面,我们指出了当前SSMs的设计选择限制了它们的表达力。我们讨论了这对SSM和LM研究的影响,并在最近的一个SSM Mamba上进行了实证验证。

Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little understanding of the in-principle abilities of such models, which could provide useful guidance to the search for better LM architectures. We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs. We find that SSMs and transformers have overlapping but distinct strengths. In star-free state tracking, SSMs implement straightforward and exact solutions to problems that transformers struggle to represent exactly. They can also model bounded hierarchical structure with optimal memory even without simulating a stack. On the other hand, we identify a design choice in current SSMs that limits their expressive power. We discuss implications for SSM and LM research, and verify results empirically on a recent SSM, Mamba.

https://arxiv.org/abs/2405.17394

https://arxiv.org/pdf/2405.17394.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注