Transformers Can Do Arithmetic with the Right Embeddings

转换器在算术任务上的表现似乎很大程度上源于它们无法跟踪每个数字在大量数字中的确切位置。我们通过为每个数字添加一个编码其相对数字开始位置的嵌入来解决这个问题。除了这些嵌入为自己提供的提升外,我们证明了这个修复方法使得架构修改(如输入注入和循环层)进一步改善性能。有了位置解决,我们可以研究 transformer 的逻辑扩展能力。它们能否解决比训练数据中更大的和更复杂的算术问题?我们发现在仅用一个 GPU 训练 20 个数字仅用一天,我们可以达到最先进的性能,在 100 个数字加法问题上的准确率可以达到 99%。最后,我们证明了这些算术进步也解锁了在多步推理任务的其他方面的改进,包括排序和乘法。

The poor performance of transformers on arithmetic tasks seems to stem in large part from their inability to keep track of the exact position of each digit inside of a large span of digits. We mend this problem by adding an embedding to each digit that encodes its position relative to the start of the number. In addition to the boost these embeddings provide on their own, we show that this fix enables architectural modifications such as input injection and recurrent layers to improve performance even further. With positions resolved, we can study the logical extrapolation ability of transformers. Can they solve arithmetic problems that are larger and more complex than those in their training data? We find that training on only 20 digit numbers with a single GPU for one day, we can reach state-of-the-art performance, achieving up to 99% accuracy on 100 digit addition problems. Finally, we show that these gains in numeracy also unlock improvements on other multi-step reasoning tasks including sorting and multiplication.

https://arxiv.org/abs/2405.17399

https://arxiv.org/pdf/2405.17399.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注