Unlocking the Secrets of Linear Complexity Sequence Model from A Unified Perspective

我们提出了线性复杂性序列模型(LCSM),一种将各种序列建模技术团结在一个框架中的全面解决方案,包括线性注意、状态空间模型、长卷积和线性循环神经网络。该目标是通过从统一和简洁的角度分析每个组件的影响来增强对这些模型的理解。具体来说,我们将这些模型的建模过程分为三个不同的阶段:扩展、振荡和收缩(EOS),每个模型具有自己的特定设置。扩展阶段包括将输入信号投影到高维内存状态。接着是在振荡阶段对内存状态进行递归操作。最后,在收缩阶段将内存状态投影到低维空间。我们对不同阶段设置对语言建模和检索任务的影响进行全面实验分析。我们的结果表明,数据驱动方法对语言建模的三个阶段的有效性至关重要,而手工制作的方法在检索任务中产生更好的性能。

We present the Linear Complexity Sequence Model (LCSM), a comprehensive solution that unites various sequence modeling techniques with linear complexity, including linear attention, state space model, long convolution, and linear RNN, within a single framework. The goal is to enhance comprehension of these models by analyzing the impact of each component from a cohesive and streamlined viewpoint. Specifically, we segment the modeling processes of these models into three distinct stages: Expand, Oscillation, and Shrink (EOS), with each model having its own specific settings. The Expand stage involves projecting the input signal onto a high-dimensional memory state. This is followed by recursive operations performed on the memory state in the Oscillation stage. Finally, the memory state is projected back to a low-dimensional space in the Shrink stage. We perform comprehensive experiments to analyze the impact of different stage settings on language modeling and retrieval tasks. Our results show that data-driven methods are crucial for the effectiveness of the three stages in language modeling, whereas hand-crafted methods yield better performance in retrieval tasks.

https://arxiv.org/abs/2405.17383

https://arxiv.org/pdf/2405.17383.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注