A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness

05 27, 2024 arXiv_CL

A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness

这篇论文揭示了一个关键见解，即一层解码器-仅Transformer与两层循环神经网络（RNN）是等价的。在此基础上，我们提出了ARC-Tran，一种验证解码器-仅Transformer对任意扰动空间鲁棒性的新方法。与ARC-Tran相比，现有的鲁棒性验证技术要么局限于特定的长 preserving 扰动，如词替换，要么局限于递归模型，如LSTM。通过谨慎管理位置编码以防止匹配错误，并利用我们的关键见解实现精确和可扩展的验证，ARC-Tran解决了这些限制。我们的评估显示，ARC-Tran（1）训练的模型比现有技术生产的更具有鲁棒性，而（2）生成的模型的认证准确度很高。

This paper reveals a key insight that a one-layer decoder-only Transformer is equivalent to a two-layer Recurrent Neural Network (RNN). Building on this insight, we propose ARC-Tran, a novel approach for verifying the robustness of decoder-only Transformers against arbitrary perturbation spaces. Compared to ARC-Tran, current robustness verification techniques are limited either to specific and length-preserving perturbations like word substitutions or to recursive models like LSTMs. ARC-Tran addresses these limitations by meticulously managing position encoding to prevent mismatches and by utilizing our key insight to achieve precise and scalable verification. Our evaluation shows that ARC-Tran (1) trains models more robust to arbitrary perturbation spaces than those produced by existing techniques and (2) shows high certification accuracy of the resulting models.

https://arxiv.org/abs/2405.17361

https://arxiv.org/pdf/2405.17361.pdf

AI论文

A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness

发表回复取消回复

A One-Layer Decoder-Only Transformer is a Two-Layer RNN: With an Application to Certified Robustness

发表回复 取消回复

发表回复取消回复