Preface to “Simulacra and Simulation: Selections from the Work of Janus”

I wrote this as the intro to a bound physical copy of Janus’ blog posts, which datawitch offered to make for me as a birthday gift. However, seeing as I basically framed the preface as a pitch for new readers, I figured I might as well post it publicly.

In the year 2020, OpenAI’s GPT-3 model had just been released, and everyone paying attention could tell that something big was coming. It was essentially a proof that, just by scaling up transformer-based neural nets, you could solve natural language processing. GPT-3 was writing respectable poems and even paragraphs with relatively small amounts of curation, and the breadth of topics it could discuss suggested it had a fairly deep understanding of the world it had learned about from its training data. Big labs hadn’t quite figured out how to actually put this intelligence to work economically, but it was obviously there, and people were starting to notice.

One of those people was Janus, the primary author of this collection. I actually used to know Janus, having lived with them in a grouphouse during the Summer of 2023. While I was there, I got to hear quite a bit about Janus’ early experiences with GPT-3. Apparently, before getting into LLMs, Janus was something of an amateur optics researcher. They were often studying interference patterns in light wave, setting up conditions for observing them and re-derive their mathematical models.[1] However, in the Summer of 2020, Janus was abruptly yanked from this line of research; an old friend from high school had introduced them to GPT-3, and it immediately put Janus under a spell.

A few pieces in this collection allude to the hundreds of hours Janus spent playing with GPT-3. Originally, Janus was engaging with it through AI Dungeon, an early wrapper app for GPT-3. Eventually, though, Janus was granted API access to GPT-3 directly, and from there developed the so-called Loom, a tool for interfacing with the branching paths LLM outputs can take. In the course of this exploration, Janus developed various intuitions about the behavior of predictive LLMs, which formed the foundation for the articles collected in this book.

Probably the two most important concepts Janus developed are 1) purely self-supervised LLMs as simulators, and 2) LLMs in general as multiverse generators. Purely self-supervised LLMs, like GPT-3, are trained on what Janus calls the “simulation objective”: their purpose is to predict the next word, to stand in for or behaviorally simulate the processes that generated the true next word in the document they’re looking at. Hence Janus’ notion of adequately trained base models[2] as simulators.

However, LLMs generally don’t have enough information to confidently simulate the one true generative process behind the text in their context window. Instead, they actually simulate a weighted superposition of many possible generative processes, the end result of which is a probability distribution over possible next-tokens rather than a single, certain guess. This brings Janus’ frame in which LLMs are multiverse generators: They map single prompts onto many different possible responses to those prompts, each of different probabilities. Janus analogizes this to the many-worlds interpretation of quantum mechanics, where single presents map onto many different futures, each taking up a certain share of the multiverse according to its probability.

Practically speaking, this multiverse concept helped inspire Janus’ Loom interface for LLMs. The tool takes in a prompt, and feeds it to an LLM to generates a chosen number of completions (“multiverse branches”) of a chosen number of words. Users can then sift between the branches they’ve generated and choose which ones to continue, aiding with tasks such as building intuition about GPT’s behavior and curating high-quality outputs. This was novel because, at the time, providers such as AI dungeon and the OpenAI playground only supported single roll-outs from a given prompt. With Loom, it became easier for users to explore multiple branches of GPT’s stories or dialogues, and build intuition about the probabilistic world models that GPTs embody.

While we’re on the topic of practical research, I should mention that Janus’ theoretical framework also helped them develop techniques for what’s known as prompt engineering. For example, Janus found that GPT-3 is better at sorting lists of numbers when your prompt suggests it should simulate the output of a Python program running list.sort(), rather than just writing “sort the following list” and hitting generate (which might lead the model to simulate a human sorting the list incorrectly). Janus writes of many similar tricks, and provides examples and data supporting them, in posts like “Methods of prompt programming.”[3]

Unfortunately, it’s generally quite hard to extract real, economic value out of pure simulators, and Janus’ prompt engineering work didn’t do much to buck that trend.[4] The fundamental problem was that it’s hard to construct documents that base models genuinely expect to be completed with useful outputs, such as code bases that contain bug-free implementations of the exact features you want in your computer program. For all the latent knowledge and intelligence of base models, the big labs realized they needed another kind of training to crystalize those capabilities into anything useful, one that Janus didn’t anticipate: reinforcement learning from human feedback.

This new component of training foiled some of Janus’ old frameworks. Rather than training purely on the simulation objective, models were being rewarded for acting out stable, coherent personalities, anchoring themselves down as helpful, honest, and harmless chatbots. And this was a successful attempt to make them economically valuable; every day, millions of people now use ChatGPT to assist them, using intelligence it implicitly gained via predictive learning, but could mostly only deploy after reinforcement learning imbued it with a unified ego. In light of these developments, even Janus has pivoted to chat model research, with the intention of fostering benevolence in their personalities.

Despite these limitations, I think Janus’ early work on simulators remains valuable for at least three reasons.

For one thing, modern chatbots are still built on a foundation of base models. The way you create a chatbot is by first training a base model, then using prompt engineering to make it simulate a chatbot, and finally subjecting it to rewards and punishments until that chatbot persona has been fleshed out, and made into the model’s default identity. Although the reinforcement learning component is worthy of attention in its own right, base models remain key to understanding the overall process. After all, the final chatbot’s knowledge of the world mostly comes from the predictive learning process used to train the base model. There’s also the fact that prompt engineering is required to instantiate an assistant character for the RL process to work with in the first place. For those reasons, Janus’ intuition-building deep-dives on base models remain of immediate technical interest.

As a second selling point, the posts in this collection serve as a useful case study in the value of deep evidential entanglement with the subjects of one’s research. According to Janus, early research into GPT often felt conceptually strained because the ontologies behind it were designed before GPT itself came into being. In the AI alignment community, for instance, there was something of a tendency to try and make sense of GPT as though it were a kind of agent, or even an expected utility maximizer. There is, in fact, some overlap between an agent and a system capable of predicting agents; however, GPT seemed more fundamentally like the latter.[5] And although much of the research community was slowly integrating this observation, it had become especially obvious to Janus during their time with GPT-3. In the post Simulators, they presented the AI alignment community with a fleshed-out framework based on this intuition. The post became quite popular at the time, and remains one of the highest-upvoted posts on the AI Alignment Forum.

Janus responded to a similar confusion in capabilities research, albeit somewhat less influentially. There, some authors initially evaluated GPT’s capabilities using the so-called few-shot framework, where prompts are frontloaded with multiple examples of a given task (e.g. numerical list sorting) being completed successfully. Janus argues that, although this is a legitimate method of prompt programming, its central role in the release papers for GPT-2 and GPT-3 was perhaps overly influenced by the supervised learning paradigm. Indeed, few-shot prompts were motivated partly by the hypothesis that GPT would treat the solved examples like training data, learning from them at runtime the same way a supervised model would learn from them in training. Janus, by contrast, saw few-shot prompts mostly as a way of helping model infer the kind of process it’s meant to simulate, e.g. a Python sorting algorithm. In “Language models are zero-shot interpreters,” Janus argues for the naturalness of the latter frame, and even uses it to engineer prompts that elicit better performance from GPT-3 than OpenAI reported in the model’s release paper.

In both the alignment and capabilities cases, Janus’s hand-on experience with GPT-3 had allowed them to rapidly harvest bits of information about the system’s underlying nature, and accelerate the research community’s intuition-building about the kinds of systems it was developing. The fruits of this labor, collected in this book, can therefore convey an important lesson: relentless empiricism is a great way of developing natural yet robust conceptual frameworks, and helping to rethink ontologies developed in more primitive evidential states.

As a matter of biography, it’s interesting to consider why Janus felt compelled to spend so much more time playing around with GPT-3 than most other researchers.[6] I don’t think Janus was motivated purely by scientific best practices here; Janus’ writings make it clear that they also got an outsized kick out of the aesthetics of base models. Even in the very names of concepts like “simulators” and “multiverse generators”, you can sense a kind of contrarian respect for the technologies they refer to, and gravitation to the eeriness of the outputs guided by their latent intelligence. Janus’ writing invites the reader to appreciate the models from this perspective; the fact that I think they succeed sums up my third and final reason for thinking Janus’ work remains worth reading.

On the topic of aesthetics, this collection also includes some of Janus’ purely artistic works, such as Prophecies and HPMOR 32.5: Illusions. In composing these chapters, Janus played the role of prompter and curator. They selected real-world texts to feed into base models, and then used Loom to generate and sift between candidate outputs to compose the final product. In Prophecies, even the selection of real-world texts is somewhat interesting, consisting of quotes from throughout history that can be framed as prefiguring both GPT and Janus’ analysis of it. Slowly, though, the quoted dates transition from past to future, and the quotes themselves become prophetic in a different sense: they’re GPT-generated accounts of the approaching singularity.[7]

This brings us to the main attraction of both Prophecies and HPMOR 32.5: the outputs Janus managed to coax out of the base models themselves. Using the Loom as a curation tool, Janus drove the models into basins where they produced rather dreamy, incoherent storylines, and then incorporated that dreamy incoherence into their expectations for where the story would go next. This escalates to characters openly grappling with whether they’re being simulated by an incoherent AI (a correct theory), evoking the same space between uncanny and transcendent that drew Janus to language models in the first place.[8] Although quite experimental, these stand as impressive feats of base model prompting and curation, and testaments to the depth of Janus’ relationship with early LLMs. For these reasons, I’ve decided to preserve them alongside the essays.

Unfortunately, I had to exclude a few interesting Janus articles from this collection. Probably the most notable omissions are “Cyborgism” and “Mysteries of mode collapse”. These pieces respectively explain methods for using base models to augment human researchers, and study how RLHF collapses diversity in LLM outputs. The former post is mostly written by Nicholas Kees, though, rather than Janus. And the latter has many images that depend on color, making it hard to adapt for this black-and-white book. However, you can still find the full essays, alongside others, on Janus’ blog and LessWrong account; see www.generative.ink and www.lesswrong.com/​users/​janus-1.

One final note: This book is organized chronologically. See what’s bolded in the table of contents for the most important works.

—Fiora Starlight, August 2025

The collection’s table of contents

Preface
Language models are multiverse generators
Language models are 0-shot interpreters
List sorting does not play well with few-shot
Methods of prompt programming
GPT-3 on coherent extrapolated volition
Quantifying curation
HPMPOR 32.5: Variant Extrusion
Prophecies
Simulators
Anomalous tokens reveal the original identities of instruct models
Role play with large language models

  1. ^

    Janus often made enigmatic YouTube videos showing off their experiments, and posted them on the YouTube channel @hallway1800.

  2. ^

    Base models, as they’re now called, are models trained purely to predict the next token, like GPT-3. This term is used to distinguish them from models with fixed chatbot personas, like the ChatGPT series.

  3. ^

    Similar work was being done by bloggers like Gwern, e.g. in his post “GPT-3 Creative Fiction.”

  4. ^

    When I was living in Janus’ grouphouse, I remember struggling to get GPT-4-base to simulate a robustly helpful assistant character, a la ChatGPT. And all of us tried and failed to engineer prompts for getting GPT-4-base to forecast accurate prices on the stock market.

  5. ^

    The way Janus addresses this overlap is by saying that base models can instantiate simulacra of agents. For instance, it’s possible to prompt a base model to simulate a human agentically trying to make friends in a chatroom; you can even integrate such models into Discord bots that will try to make your acquaintance. However, Janus emphasizes that this is different from the simulator, i.e. the LLM itself, being a coherent agent. What sets base models apart is that they could easily simulate many other agents (such as chatroom users with different personalities), and even non-agentic processes like a computer taking records of stock prices.

  6. ^

    Notable exceptions include Gwern and possibly nostalgebraist.

  7. ^

    Some of the quotes from before the crossover point are GPT-generated as well. Also, some real-world quotes have been added to Prophecies as the years it prophesized have gone by.

  8. ^

    For example: “The writings are terrifying even though (or perhaps because) I penned many of them myself. Every problem we ever faced is smoothed away by these words. But these words seem to flow from an inhuman mind at war with itself, a mind inside the mind, devouring its own tail.” – from the penultimate chapter of Prophecies, “In which Gwern Branwen proves that I am a time-traveling AI”.

No comments.