Exploring the Multiverse of Large Language Models

[Crossposted from Medium.com.]

Large language models can generate texts under different tasks, for example, question answering, machine translation, document summarization, etc. Beyond being right and wrong, large language models can also have some interesting behaviors, such as creativity, hallucination, and degeneration. This post explores the hypothesis that a large language model is a multiverse with the following properties:

The multiverse contains different generation scenarios, e.g., right, wrong, creativity, hallucination, degeneration, etc.
Decoding strategies could help us to jump into different universes, but might not be able to reach the target universe.
Navigating strategies could lead us to the universe we prefer, but might not be able to guarantee everything in the right place.
Multiple universes could be satisfied with a given task so developing a universal decoding/navigating strategy might be a difficult task.

If the above description is too vague, here are some tips:

An auto-regressive large language model generates a probability distribution of the next token over the token space implying the good, the bad, and the ugly of the future.
Decoding strategies refer to greedy sampling, top-k sampling, top-p sampling, temperature setting, etc., which are used to select the next token.
Navigating strategies refer to reinforcement learning, knowledge augmentation, prompting engineering, etc., which are used to assist the overall process of selecting the next tokens.

This post briefs the concept of the multiverse in large language models and some decoding/navigating strategies for it. Now let’s jump into Avengers: Endgame to get a taste of multiple universes.

Avengers: Endgame

Scene 1: Scott Lang discusses the idea of navigation with Steve Rogers and Natasha Romanoff.

Scene 2: Bruce Banner, Steve Rogers, and Natasha Romanoff help Scott Lang to make time travel.

Scene 3: Tony Stark solves time travel through a simulation of the Möbius strip.

Scene 4: Tony Stark brings the navigation device to Steve Rogers.

Now we have some sense of the multiverse in Avengers: Endgame, and how it could be right and wrong. But, how does it relate to the multiverse of large language models?

Toy Examples of Language Modeling

To illustrate the multiverse of a large language model, consider the following scenarios in Figure 1.

Figure 1. Creativity, Hallucination, and Degeneration

Generating the texts of creativity, hallucination, and degeneration in the above example is based on a simple setup:

The training phase of the toy language model is to summarize the statistics of the next words following the current word.
The inference phase is to select the next word based on the current word and a greedy sampling mechanism.

My interpretation of the toy example is described in “A Simple Analysis of Hallucinations in Language Models” and “A Simple Analysis of the Repetition Problem in Text Generation”. A more thoughtful experiment of text generation can be found in the blog under Hugging Face: “How to generate text: using different decoding methods for language generation with Transformers.”

Multiverse in a Large Language Model

Figure 2 illustrates an example of a GPT-based language generation process: a pre-trained/fine-tuned large language model takes an input “Doctor Stephen Strange goes to” to create some outputs. The training process of such a large language model can be found in “State of GPT”. The interesting scenario here is that the probability distribution of the next token (W616/W838/W16828) leads Doctor Stephen Strange into different universes (W616 to Bleecker Street / W838 to the Illuminati / W16828 to see Thor). How does the large language model select the world to go to? Well, the large language model only creates the multiverse while selecting a world is done by the decoding strategy, such as greedy sampling, top-k sampling, top-p sampling, temperature setting, etc.

Figure 2. GPT-based language generation process.

Can the decoding strategy help us to reach the target universe? A philosophical argument is that it depends on the madness of multiverse or the multiverse of madness. In reality, what the decoding strategy reached is a world most likely associated with the current context. To further complicate the scenarios, multiple universes might be great for a given task as illustrated in Figure 3. Consequently, we need some mechanisms which can work at both the micro level and the macro level.

Figure 3. Multiverse for a closed question and an open question.

Navigating the Multiverse of a Large Language Model

How to navigate the multiverse of a large language model? In other words, given a specific task, how to reach the target universe(s) with everything in the right place? Reinforcement learning, knowledge augmentation, and prompting engineering offer some interesting insights.

1. Reinforcement Learning

Reinforcement learning is a machine learning paradigm in which an agent learns to make a sequence of decisions so as to maximize long-term rewards. Considering two scenarios of reinforcement learning shown in Figure 4:

Figure 4-a shows the case of playing a ping-pong game, where an agent learns to take actions according to a given state with the goal of increasing the probability of winning;
Figure 4-b shows the case of text generation, where an agent (i.e., a large language model) learns to generate tokens according to a given context with the goal of increasing the quality of the generated content.

Figure 4. (a) Deep RL Bootcamp Lecture 4B Policy Gradients Revisited

Figure 4. (b) Transformer Reinforcement Learning Library

There are many tough issues in applying reinforcement learning to large language models. For example, a reward could represent different meanings of the quality of the content [ref: ChatGPT/RLHF, GPT-4/Safety], a reward could be given at the end of a generated text or in the process of text generation [ref: outcome supervision vs process supervision], a reward could be created by humans or classifiers [ref: controlled sentiment reviews], etc.

The analogy between reinforcement learning and multiverse navigation is that we train a large language model to travel many trajectories with a sense of purpose so that hopefully and eventually “we could enter the quantum realm at a certain point in time but then exit the quantum realm at another point in time” (quoted from Scott Lang in Avengers: Endgame.)

2. Knowledge Augmentation

Integrating external knowledge from the Internet or a private database with a large language model is a major trend to solve the limitation of a standalone large language model and deliver it to the real-life environment. For example, Microsoft Bing Chat and Google Bard are two well-known systems integrating large language models and the Internet, while LangChain has become a popular platform for integrating large language models with the Internet and different databases. With external knowledge in a prompt, a large language model has a specific context or a reference point to make a travel in the multiverse. Figure 5 shows the process of using knowledge augmentation as the navigation device.

Figure 5. Knowledge augmentation as the navigation device.

3. Prompting Engineering

Prompt engineering could take many forms to instruct a large language model to generate meaningful content. Figure 6 shows the process of using prompt engineering as the navigation device. The figure also shows some examples in prompt engineering, e.g., Chain of Thought, Think Step by Step, Explain Like I’m Five, Role Play, etc.

Figure 6. Prompt engineering as the navigation device.

It is an art and a science to understand how a specific trick works and why it works. Here are some explanations from Google Bard.

The prompt “Think Step by Step” works in large language models (LLMs) because it tells the model that the user wants a logical and step-by-step explanation of the topic at hand. This is important because LLMs are trained on a massive dataset of text, which includes a lot of information that can be presented in a complex and non-linear way. If the user does not provide any guidance, the LLM may generate an explanation that is difficult to follow or understand.
The “Think Step by Step” prompt helps the LLM to focus on the main points of the topic and to present them in a logical order. This can be helpful for users who are trying to learn something new or who need to understand a complex topic in detail.
The prompt “Explain Like I’m Five” (ELI5) works in large language models (LLMs) because it tells the model that the user wants a simplified explanation of the topic at hand. This is important because LLMs are trained on a massive dataset of text, which includes a lot of complex and technical information. If the user does not provide any guidance, the LLM may generate an explanation that is too complex or difficult to understand.
The ELI5 prompt helps the LLM to focus on the most important aspects of the topic and to use simple language that is easy for a child to understand. This can be helpful for users who are not familiar with the topic or who do not have a lot of technical expertise.

Conclusions

The multiverse hypothesis of large language models is open to multiple interpretations. Please offer your thoughts on this topic. Thanks.