RSS

Si­mu­la­tor Theory

TagLast edit: 20 May 2025 21:30 UTC by TristanTrim

Simulator Theory (in the context of AI) is an ontology or frame for understanding the working of large generative models, such as the GPT series from OpenAI. Broadly it views these models as simulating a learned distribution with various degrees of fidelity, which in the case of language models trained on a large corpus of text is the mechanics underlying the process that generated that corpus, which may be understood as the people writing, or the dynamics they write about.

It can also refer to an alignment research agenda, that deals with better understanding simulator conditionals, effects of downstream training, alignment-relevant properties such as myopia and agency in the context of language models, and using them as alignment research accelerators. See also: Cyborgism

Simulators

janus2 Sep 2022 12:45 UTC
670 points
168 comments41 min readLW link8 reviews
(generative.ink)

Con­di­tion­ing Pre­dic­tive Models: Large lan­guage mod­els as predictors

2 Feb 2023 20:28 UTC
89 points
4 comments13 min readLW link

Why Si­mu­la­tor AIs want to be Ac­tive In­fer­ence AIs

10 Apr 2023 18:23 UTC
96 points
9 comments8 min readLW link1 review

The Com­pleat Cybornaut

19 May 2023 8:44 UTC
66 points
2 comments16 min readLW link

How to Con­trol an LLM’s Be­hav­ior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC
65 points
30 comments11 min readLW link

The Waluigi Effect (mega-post)

Cleo Nardo3 Mar 2023 3:22 UTC
645 points
188 comments16 min readLW link

Mo­ti­vat­ing Align­ment of LLM-Pow­ered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC
35 points
4 comments39 min readLW link

Good­bye, Shog­goth: The Stage, its An­i­ma­tron­ics, & the Pup­peteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC
48 points
8 comments36 min readLW link

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
60 points
8 comments20 min readLW link

Si­mu­lacra are Things

janus8 Jan 2023 23:03 UTC
63 points
7 comments2 min readLW link

‘simu­la­tor’ fram­ing and con­fu­sions about LLMs

Beth Barnes31 Dec 2022 23:38 UTC
104 points
11 comments4 min readLW link

Case Stud­ies in Si­mu­la­tors and Agents

25 May 2025 5:40 UTC
12 points
8 comments6 min readLW link

A smart enough LLM might be deadly sim­ply if you run it for long enough

Mikhail Samin5 May 2023 20:49 UTC
19 points
16 comments8 min readLW link

[Si­mu­la­tors sem­i­nar se­quence] #1 Back­ground & shared assumptions

2 Jan 2023 23:48 UTC
50 points
4 comments3 min readLW link

Agents vs. Pre­dic­tors: Con­crete differ­en­ti­at­ing factors

evhub24 Feb 2023 23:50 UTC
37 points
3 comments4 min readLW link

You’re not a simu­la­tion, ’cause you’re hallucinating

Stuart_Armstrong21 Feb 2023 12:12 UTC
25 points
6 comments1 min readLW link

Su­per-Luigi = Luigi + (Luigi—Waluigi)

Alexei17 Mar 2023 15:27 UTC
16 points
9 comments1 min readLW link

Im­pli­ca­tions of simulators

TW1237 Jan 2023 0:37 UTC
17 points
0 comments12 min readLW link

The al­gorithm isn’t do­ing X, it’s just do­ing Y.

Cleo Nardo16 Mar 2023 23:28 UTC
53 points
43 comments5 min readLW link

One path to co­her­ence: con­di­tion­al­iza­tion

porby29 Jun 2023 1:08 UTC
28 points
4 comments4 min readLW link

Why do we as­sume there is a “real” shog­goth be­hind the LLM? Why not masks all the way down?

Robert_AIZI9 Mar 2023 17:28 UTC
63 points
48 comments2 min readLW link

Re­marks 1–18 on GPT (com­pressed)

Cleo Nardo20 Mar 2023 22:27 UTC
145 points
35 comments31 min readLW link

Con­di­tion­ing Pre­dic­tive Models: Outer al­ign­ment via care­ful conditioning

2 Feb 2023 20:28 UTC
72 points
15 comments57 min readLW link

GPTs are Pre­dic­tors, not Imitators

Eliezer Yudkowsky8 Apr 2023 19:59 UTC
420 points
100 comments3 min readLW link3 reviews

Two prob­lems with ‘Si­mu­la­tors’ as a frame

ryan_greenblatt17 Feb 2023 23:34 UTC
79 points
13 comments5 min readLW link

Con­di­tion­ing Pre­dic­tive Models: De­ploy­ment strategy

9 Feb 2023 20:59 UTC
28 points
0 comments10 min readLW link

FAQ: What the heck is goal ag­nos­ti­cism?

porby8 Oct 2023 19:11 UTC
66 points
38 comments28 min readLW link

[Question] Goals of model vs. goals of simu­lacra?

dr_s12 Apr 2023 13:02 UTC
5 points
7 comments1 min readLW link

In­ner Misal­ign­ment in “Si­mu­la­tor” LLMs

Adam Scherlis31 Jan 2023 8:33 UTC
84 points
12 comments4 min readLW link

Re­cur­ren­tGPT: a loom-type tool with a twist

mishka25 May 2023 17:09 UTC
10 points
0 comments3 min readLW link
(arxiv.org)

Us­ing pre­dic­tors in cor­rigible systems

porby19 Jul 2023 22:29 UTC
21 points
6 comments27 min readLW link

Si­mu­la­tors In­crease the Like­li­hood of Align­ment by Default

Wuschel Schulz30 Apr 2023 16:32 UTC
14 points
2 comments5 min readLW link

Agents, Si­mu­la­tors and Interpretability

7 Jun 2025 6:06 UTC
11 points
0 comments5 min readLW link

The Limit of Lan­guage Models

DragonGod6 Jan 2023 23:53 UTC
44 points
26 comments4 min readLW link

In­stru­men­tal­ity makes agents agenty

porby21 Feb 2023 4:28 UTC
21 points
7 comments6 min readLW link

How I Learned To Stop Wor­ry­ing And Love The Shoggoth

Peter Merel12 Jul 2023 17:47 UTC
9 points
15 comments5 min readLW link

How are Si­mu­la­tors and Agents re­lated?

Robert Kralisch29 Apr 2024 0:22 UTC
6 points
0 comments7 min readLW link

Re­veal­ing In­ten­tion­al­ity In Lan­guage Models Through AdaVAE Guided Sampling

jdp20 Oct 2023 7:32 UTC
119 points
15 comments22 min readLW link

Align­ment of Au­toGPT agents

Ozyrus12 Apr 2023 12:54 UTC
14 points
1 comment4 min readLW link

IHSS: A Har­monic Field Si­mu­la­tor for Sym­bolic Feed­back Learning

Synergy9 Apr 2025 23:09 UTC
1 point
0 comments1 min readLW link

Gra­di­ent Filtering

18 Jan 2023 20:09 UTC
56 points
16 comments13 min readLW link

Steer­ing Be­havi­our: Test­ing for (Non-)My­opia in Lan­guage Models

5 Dec 2022 20:28 UTC
40 points
19 comments10 min readLW link

[ASoT] Si­mu­la­tors show us be­havi­oural prop­er­ties by default

Jozdien13 Jan 2023 18:42 UTC
36 points
3 comments3 min readLW link

Con­di­tion­ing Pre­dic­tive Models: The case for competitiveness

6 Feb 2023 20:08 UTC
20 points
3 comments11 min readLW link

[Si­mu­la­tors sem­i­nar se­quence] #2 Semiotic physics—revamped

27 Feb 2023 0:25 UTC
24 points
23 comments13 min readLW link

Notes on Antelligence

Aurigena13 May 2023 18:38 UTC
2 points
0 comments9 min readLW link

Lenses, Me­taphors, and Meaning

8 Jul 2025 19:46 UTC
7 points
0 comments4 min readLW link

A note on ‘semiotic physics’

metasemi11 Feb 2023 5:12 UTC
11 points
13 comments6 min readLW link

How To Prevent a Dystopia

ank29 Jan 2025 14:16 UTC
−3 points
4 comments1 min readLW link

Re­search Re­port: In­cor­rect­ness Cas­cades (Cor­rected)

Robert_AIZI9 May 2023 21:54 UTC
9 points
0 comments9 min readLW link
(aizi.substack.com)

Ra­tional Effec­tive Utopia & Nar­row Way There: Math-Proven Safe Static Mul­tiver­sal mAX-In­tel­li­gence (AXI), Mul­tiver­sal Align­ment, New Ethico­physics… (Aug 11)

ank11 Feb 2025 3:21 UTC
13 points
8 comments38 min readLW link

Lan­guage and Ca­pa­bil­ities: Test­ing LLM Math­e­mat­i­cal Abil­ities Across Languages

Ethan Edwards4 Apr 2024 13:18 UTC
24 points
2 comments36 min readLW link

Pro­saic mis­al­ign­ment from the Solomonoff Predictor

Cleo Nardo9 Dec 2022 17:53 UTC
43 points
3 comments5 min readLW link

[Question] Could Si­mu­lat­ing an AGI Tak­ing Over the World Ac­tu­ally Lead to a LLM Tak­ing Over the World?

simeon_c13 Jan 2023 6:33 UTC
15 points
1 comment1 min readLW link

Philo­soph­i­cal Cy­borg (Part 1)

14 Jun 2023 16:20 UTC
31 points
4 comments13 min readLW link

Pre­train­ing Lan­guage Models with Hu­man Preferences

21 Feb 2023 17:57 UTC
135 points
20 comments11 min readLW link2 reviews

Con­di­tion­ing Pre­dic­tive Models: In­ter­ac­tions with other approaches

8 Feb 2023 18:19 UTC
32 points
2 comments11 min readLW link

OpenAI Credit Ac­count (2510$)

Emirhan BULUT21 Jan 2024 2:32 UTC
1 point
0 comments1 min readLW link

A Re­view of In-Con­text Learn­ing Hy­pothe­ses for Au­to­mated AI Align­ment Research

alamerton18 Apr 2024 18:29 UTC
25 points
4 comments16 min readLW link

Align­ing Agents, Tools, and Simulators

11 May 2025 7:59 UTC
22 points
2 comments6 min readLW link

Karpenchuk’s The­ory: Hu­man Life as a Si­mu­la­tion for Con­scious­ness Development

Karpenchuk Bohdan 2 Aug 2024 0:03 UTC
1 point
0 comments2 min readLW link

When can a mimic sur­prise you? Why gen­er­a­tive mod­els han­dle seem­ingly ill-posed problems

David Johnston5 Nov 2022 13:19 UTC
8 points
4 comments16 min readLW link

Si­tu­a­tional aware­ness in Large Lan­guage Models

Simon Möller3 Mar 2023 18:59 UTC
32 points
2 comments7 min readLW link

The (lo­cal) unit of in­tel­li­gence is FLOPs

boazbarak5 Jun 2023 18:23 UTC
42 points
7 comments5 min readLW link

Re­search Re­port: In­cor­rect­ness Cascades

Robert_AIZI14 Apr 2023 12:49 UTC
19 points
0 comments10 min readLW link
(aizi.substack.com)

Si­mu­la­tors, con­straints, and goal ag­nos­ti­cism: por­bynotes vol. 1

porby23 Nov 2022 4:22 UTC
40 points
2 comments35 min readLW link

I was Wrong, Si­mu­la­tor The­ory is Real

Robert_AIZI26 Apr 2023 17:45 UTC
75 points
7 comments3 min readLW link
(aizi.substack.com)

Early Re­sults: Do LLMs com­plete false equa­tions with false equa­tions?

Robert_AIZI30 Mar 2023 20:14 UTC
14 points
0 comments4 min readLW link
(aizi.substack.com)

Memetic Judo #3: The In­tel­li­gence of Stochas­tic Par­rots v.2

Max TK20 Aug 2023 15:18 UTC
8 points
33 comments6 min readLW link

In­ter­view with Robert Kral­isch on Simulators

WillPetillo26 Aug 2024 5:49 UTC
17 points
0 comments75 min readLW link

Col­lec­tive Identity

18 May 2023 9:00 UTC
59 points
12 comments8 min readLW link

Si­mu­lacra Welfare: Meet Clark

Grace Kind14 Sep 2025 20:21 UTC
29 points
2 comments1 min readLW link
(gracekind.net)

On the fu­ture of lan­guage models

owencb20 Dec 2023 16:58 UTC
105 points
17 comments36 min readLW link

The Trinity Ar­chi­tect Hy­poth­e­sis (A fu­sion of The Trinity Para­dox & The Ar­chi­tect’s Cy­cle)

kaninwithrice24 Feb 2025 4:40 UTC
1 point
0 comments2 min readLW link

[ASoT] Fine­tun­ing, RL, and GPT’s world prior

Jozdien2 Dec 2022 16:33 UTC
45 points
8 comments5 min readLW link

The Frac­tal Hy­poth­e­sis: Are We Already in a Si­mu­la­tion?

Quan9 Jan 2025 2:53 UTC
1 point
0 comments3 min readLW link

AGI-level rea­soner will ap­pear sooner than an agent; what the hu­man­ity will do with this rea­soner is critical

Roman Leventov30 Jul 2022 20:56 UTC
24 points
10 comments1 min readLW link

Cyborgism

10 Feb 2023 14:47 UTC
334 points
47 comments35 min readLW link2 reviews

Con­di­tion­ing Pre­dic­tive Models: Mak­ing in­ner al­ign­ment as easy as possible

7 Feb 2023 20:04 UTC
27 points
2 comments19 min readLW link

Is In­ter­pretabil­ity All We Need?

RogerDearnaley14 Nov 2023 5:31 UTC
1 point
1 comment1 min readLW link

[Question] Im­pres­sions from base-GPT-4?

mishka8 Nov 2023 5:43 UTC
26 points
25 comments1 min readLW link

Con­di­tion­ing Gen­er­a­tive Models

Adam Jermyn25 Jun 2022 22:15 UTC
24 points
18 comments10 min readLW link

Un­safe AI as Dy­nam­i­cal Systems

Robert_AIZI14 Jul 2023 15:31 UTC
11 points
0 comments3 min readLW link
(aizi.substack.com)

The Hyper­com­plex Si­mu­la­tion Hy­poth­e­sis: Uni­verse as an Ex­plo­ra­tory Eng­ine of Life and Consciousness

Rodrigo Valero20 May 2025 11:52 UTC
1 point
0 comments1 min readLW link

Philo­soph­i­cal Cy­borg (Part 2)...or, The Good Successor

ukc1001421 Jun 2023 15:43 UTC
21 points
1 comment31 min readLW link

Emer­gence of Si­mu­la­tors and Agents

25 Jun 2025 6:59 UTC
10 points
0 comments5 min readLW link

Us­ing ide­olog­i­cally-charged lan­guage to get gpt-3.5-turbo to di­s­obey it’s sys­tem prompt: a demo

Milan W24 Aug 2024 0:13 UTC
3 points
0 comments6 min readLW link

Repli­ca­tors, Gods and Bud­dhist Cosmology

KristianRonn16 Jan 2025 10:51 UTC
15 points
3 comments26 min readLW link

Im­plied “util­ities” of simu­la­tors are broad, dense, and shallow

porby1 Mar 2023 3:23 UTC
45 points
7 comments3 min readLW link

Un­der­speci­fi­ca­tion of Or­a­cle AI

15 Jan 2023 20:10 UTC
30 points
12 comments19 min readLW link

The case for more am­bi­tious lan­guage model evals

Jozdien30 Jan 2024 0:01 UTC
117 points
30 comments5 min readLW link

ICA Simulacra

Ozyrus5 Apr 2023 6:41 UTC
26 points
2 comments7 min readLW link

Places of Lov­ing Grace [Story]

ank18 Feb 2025 23:49 UTC
−1 points
0 comments4 min readLW link

Should AIs have a right to their an­ces­tral hu­man­ity?

kromem16 Sep 2025 16:58 UTC
67 points
1 comment11 min readLW link

Emer­gent Misal­ign­ment and Emer­gent Alignment

Alvin Ånestrand3 Apr 2025 8:04 UTC
5 points
0 comments8 min readLW link

The util­ity of hu­mans within a Su­per Ar­tifi­cial In­tel­li­gence realm.

Marc Monroy11 Oct 2023 17:30 UTC
1 point
0 comments7 min readLW link
No comments.