RecurrentGPT: a loom-type tool with a twist

Link post

This is a link post for RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text paper by a team from ETH Zürich (“Swiss Federal Institute of Technology in Zürich”).

The Compleat Cybornaut recent post discusses looming and the evolving field of loom-type tools. We are going to see more loom-type tools in the near future, and it might be helpful for their authors to be aware of the details of RecurrentGPT tool.

The authors of RecurrentGPT don’t seem to be aware of the Simulator theory or Cyborgism, but they seem to be affected by their zeitgeist. The abstract starts as follows (italic is mine):

The fixed-size context of Transformer makes GPT models incapable of generating arbitrarily long text. In this paper, we introduce RecurrentGPT, a language-based simulacrum of the recurrence mechanism in RNNs. RecurrentGPT is built upon a large language model (LLM) such as ChatGPT and uses natural language to simulate the Long Short-Term Memory mechanism in an LSTM.

AaltoMediaAI notes[1]:

This is not the first paper to use LLMs to both generate text and make notes/​edits into a long-term memory. However, this one is interesting in that the approach mimics the operation of LSTMs, on a textual level, using prompt engineering.

GPT-4 comments on the abstract of this paper as follows[2]:

The paper “RecurrentGPT: Interactive Generation of (Arbitrarily) Long Text” by Wangchunshu Zhou, Yuchen Eleanor Jiang, Peng Cui, Tiannan Wang, Zhenxin Xiao, Yifan Hou, Ryan Cotterell, and Mrinmaya Sachan proposes a novel approach to address a key limitation of GPT models: the inability to generate arbitrarily long text due to their fixed-size context.

In the paper, the authors introduce RecurrentGPT, which utilizes a language-based simulation of the recurrence mechanism found in Recurrent Neural Networks (RNNs). This model builds upon a Large Language Model (LLM), such as ChatGPT, and simulates the Long Short-Term Memory (LSTM) mechanism in a language-based format.

RecurrentGPT operates by generating a paragraph of text at each timestep and updating its language-based long-short term memory, which is stored on the hard drive, as well as the prompt. This allows RecurrentGPT to generate texts of arbitrary length without forgetting any of the context. Moreover, since these language-based memories can be observed and edited by users, RecurrentGPT is interpretable and allows for interactive generation of long text. This makes it a step towards next-generation computer-assisted writing systems that go beyond local editing suggestions.

An interesting aspect of RecurrentGPT is its proposed usage as an interactive fiction that directly interacts with consumers, a concept the authors term as “AI As Contents” (AIAC). The authors believe AIAC to be the next form of conventional AI-generated content (AIGC). They also suggest the possibility of using RecurrentGPT to create personalized interactive fiction that directly interacts with readers instead of writers.

In a broader context, RecurrentGPT exemplifies the usefulness of borrowing ideas from popular model designs in cognitive science and deep learning to enhance the capabilities of LLMs. The authors have made their code available for further exploration and also provide an online demo of RecurrentGPT.

Now for the difficult part of understanding the core of their work better.

We know that feed-forward Transformers function as RNNs when used in the autoregressive mode[3], and that, moreover, when one uses Transformers to generate text in the autoregressive mode by sampling from the distributions, the state is encoded in the generated text.

We also know that the essence of fighting the “vanishing gradients” in architectures such as LSTM is to make sure that a single iteration transforms the state, X, to X + small epsilon[4].

In this case, the authors were informally inspired by LSTM, this is not a literal LSTM implementation. The long-term memory is implemented via a well-known VectorDB (so the size of the long-term memory is unbounded for all practical purposes)[5].

The core of the paper is Section 2, which is slightly under 2 pages. I asked GPT-4 to summarize that section, and I included the LaTeX source of that section (the comments in Section 2.2 of the LaTeX source include a methodological discussion which is helpful for better understanding of this material)[6].

Quoting the 2 central paragraphs from the GPT-4 summary of Section 2:

The model utilizes “language-based building blocks,” which include an input/​output system and a Long-Short Term Memory mechanism. The input/​output system, referred to as “content” and “plan,” is akin to an outline of text to be generated (plan) and the generated text (content). The model also incorporates a long-term memory mechanism, which stores previously generated content using a VectorDB approach, allowing the model to store more information than memory-based Transformers. The short-term memory is a brief paragraph summarizing recent steps, intended to maintain coherence in the generated content.

For its recurrent computation, the \baby[7] model relies on a prompt template and some Python code to mimic the recurrent computation scheme of RNNs. The model constructs input prompts by filling the prompt template with input content/​plan and its internal long-short term memory. It also has a built-in feature to encourage diversity in output and interactivity by generating multiple plans, allowing human users to select or write the most suitable plan.

  1. ↩︎
  2. ↩︎
  3. ↩︎
  4. ↩︎

    See, for example, Overcoming the vanishing gradient problem in plain recurrent networks. See, in particular, my brief overview of the first version of that paper: Understanding Recurrent Identity Networks

  5. ↩︎

    They specifically use SentenceTransformers Python framework developed from the Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks paper.

  6. ↩︎

    See files and 2_RecurrentGPT.tex in​with-GPT-4/​PaperUnderstanding/​FurtherExploration

  7. ↩︎

    \baby is RecurrentGPT in this LaTeX text.

No comments.