THREAD: Thinking Deeper with Recursive Spawning

大语言模型(LLMs)在各种设置中表现出令人印象深刻的能力,但在上下文长度和复杂性增加时仍然遇到困难。为解决这一挑战,我们提出了思考递归和动态生成(ThReaD)框架。将THREAD模型生成视为一个执行线程,根据上下文,可以运行到完成或动态地创建新的线程。通过创建新线程,可以将工作(例如,思考、获取信息)卸载到子线程,而子线程仅返回父线程执行工作所需的标记。实际上,这使得模型能够根据需要适应生产标记的中间工作量。我们在LLM任务求解和问题回答设置中应用THREAD,其中动态线程允许模型将给定的任务或问题逐步分解为越来越简单的子问题,这些子问题可以通过单独的子线程进行求解。我们在多种数据集上测试THREAD,使用少样本学习方法实现。THREAD在ALFWorld、TextCraft和WebShop等基准测试中实现了与GPT-4和GPT-3.5相当的最佳性能,包括两个新的基准,DataCommons QA和MIMIC-III ICU QA。此外,THREAD比现有框架优越10%到50%的绝对点,包括Llama-3-8b和CodeLlama-7b。

Large language models (LLMs) have shown impressive capabilities across diverse settings, but still struggle as the length and complexity of the context increases. To address this challenge, we propose Thinking Recursively and Dynamically (ThReaD). THREAD frames model generation as a thread of execution that, based on the context, can run to completion or dynamically spawn new threads. By spawning, threads can offload work (e.g., thinking, retrieving information) to child threads, which only return tokens needed for the parent thread to do its work. In effect, this enables the model to adapt, as needed, the amount of intermediate work used to produce tokens. We apply THREAD in the settings of LLM task solving and question answering, where the dynamic threading allows the model to recursively decompose the given task or question into progressively simpler sub-problems that can be solved by separate child threads. We test THREAD, implemented using a few-shot learning approach, on diverse benchmarks for agent tasks and data-grounded question answering. THREAD achieves state-of-the-art performance with GPT-4 and GPT-3.5 on these benchmarks, including ALFWorld, TextCraft, and WebShop, along with two new benchmarks, DataCommons QA and MIMIC-III ICU QA. In addition, THREAD outperforms existing frameworks by 10% to 50% absolute points with smaller models, including Llama-3-8b and CodeLlama-7b.

https://arxiv.org/abs/2405.17402

https://arxiv.org/pdf/2405.17402.pdf

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注