Simulator Theory

TagLast edit: 14 Feb 2023 17:08 UTC by Jozdien

Simulator theory in the context of AI refers to an ontology or frame for understanding the working of large generative models, such as the GPT series from OpenAI. Broadly it views these models as simulating a learned distribution with various degrees of fidelity, which in the case of language models trained on a large corpus of text is the mechanics underlying our world.

It can also refer to an alignment research agenda, that deals with better understanding simulator conditionals, effects of downstream training, alignment-relevant properties such as myopia and agency in the context of language models, and using them as alignment research accelerators. See also: Cyborgism

Language and Capabilities: Testing LLM Mathematical Abilities Across Languages

Ethan Edwards4 Apr 2024 13:18 UTC

14 points

0 comments36 min readLW link

The case for more ambitious language model evals

Jozdien30 Jan 2024 0:01 UTC

104 points

25 comments5 min readLW link

OpenAI Credit Account (2510$)

Emirhan BULUT21 Jan 2024 2:32 UTC

1 point

0 comments1 min readLW link

Motivating Alignment of LLM-Powered Agents: Easy for AGI, Hard for ASI?

RogerDearnaley11 Jan 2024 12:56 UTC

22 points

4 comments39 min readLW link

Goodbye, Shoggoth: The Stage, its Animatronics, & the Puppeteer – a New Metaphor

RogerDearnaley9 Jan 2024 20:42 UTC

46 points

8 comments36 min readLW link

On the future of language models

owencb20 Dec 2023 16:58 UTC

105 points

17 comments1 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaley28 Nov 2023 19:56 UTC

64 points

30 comments11 min readLW link

Introduction and current research agenda

quila20 Nov 2023 12:42 UTC

27 points

1 comment1 min readLW link

Is Interpretability All We Need?

RogerDearnaley14 Nov 2023 5:31 UTC

1 point

1 comment1 min readLW link

[Question] Impressions from base-GPT-4?

mishka8 Nov 2023 5:43 UTC

24 points

18 comments1 min readLW link

Revealing Intentionality In Language Models Through AdaVAE Guided Sampling

jdp20 Oct 2023 7:32 UTC

117 points

14 comments22 min readLW link

The utility of humans within a Super Artificial Intelligence realm.

Marc Monroy11 Oct 2023 17:30 UTC

1 point

0 comments7 min readLW link

FAQ: What the heck is goal agnosticism?

porby8 Oct 2023 19:11 UTC

66 points

36 comments28 min readLW link

A Mathematical Model for Simulators

lukemarks2 Oct 2023 6:46 UTC

11 points

0 comments2 min readLW link

The Löbian Obstacle, And Why You Should Care

lukemarks7 Sep 2023 23:59 UTC

18 points

6 comments2 min readLW link

Memetic Judo #3: The Intelligence of Stochastic Parrots v.2

Max TK20 Aug 2023 15:18 UTC

8 points

33 comments6 min readLW link

Using predictors in corrigible systems

porby19 Jul 2023 22:29 UTC

19 points

6 comments27 min readLW link

Unsafe AI as Dynamical Systems

Robert_AIZI14 Jul 2023 15:31 UTC

11 points

0 comments3 min readLW link

(aizi.substack.com)

How I Learned To Stop Worrying And Love The Shoggoth

Peter Merel12 Jul 2023 17:47 UTC

10 points

9 comments5 min readLW link

One path to coherence: conditionalization

porby29 Jun 2023 1:08 UTC

28 points

4 comments4 min readLW link

Philosophical Cyborg (Part 2)...or, The Good Successor

ukc1001421 Jun 2023 15:43 UTC

21 points

1 comment31 min readLW link

Partial Simulation Extrapolation: A Proposal for Building Safer Simulators

lukemarks17 Jun 2023 13:55 UTC

16 points

0 comments10 min readLW link

Philosophical Cyborg (Part 1)

ukc10014, Roman Leventov and NicholasKees

14 Jun 2023 16:20 UTC

31 points

4 comments13 min readLW link

Higher Dimension Cartesian Objects and Aligning ‘Tiling Simulators’

lukemarks11 Jun 2023 0:13 UTC

22 points

0 comments5 min readLW link

The (local) unit of intelligence is FLOPs

boazbarak5 Jun 2023 18:23 UTC

40 points

7 comments5 min readLW link

RecurrentGPT: a loom-type tool with a twist

mishka25 May 2023 17:09 UTC

10 points

0 comments3 min readLW link

(arxiv.org)

The Compleat Cybornaut

ukc10014, Jozdien and NicholasKees

19 May 2023 8:44 UTC

64 points

2 comments16 min readLW link

Collective Identity

NicholasKees, ukc10014 and Garrett Baker

18 May 2023 9:00 UTC

59 points

12 comments8 min readLW link

Notes on Antelligence

Aurigena13 May 2023 18:38 UTC

2 points

0 comments9 min readLW link

Research Report: Incorrectness Cascades (Corrected)

Robert_AIZI9 May 2023 21:54 UTC

9 points

0 comments9 min readLW link

(aizi.substack.com)

A smart enough LLM might be deadly simply if you run it for long enough

Mikhail Samin5 May 2023 20:49 UTC

16 points

16 comments8 min readLW link

Simulators Increase the Likelihood of Alignment by Default

Wuschel Schulz30 Apr 2023 16:32 UTC

13 points

1 comment5 min readLW link

I was Wrong, Simulator Theory is Real

Robert_AIZI26 Apr 2023 17:45 UTC

75 points

7 comments3 min readLW link

(aizi.substack.com)

Research Report: Incorrectness Cascades

Robert_AIZI14 Apr 2023 12:49 UTC

19 points

0 comments10 min readLW link

(aizi.substack.com)

[Question] Goals of model vs. goals of simulacra?

dr_s12 Apr 2023 13:02 UTC

5 points

7 comments1 min readLW link

Alignment of AutoGPT agents

Ozyrus12 Apr 2023 12:54 UTC

14 points

1 comment4 min readLW link

Why Simulator AIs want to be Active Inference AIs

Jan_Kulveit and rosehadshar

10 Apr 2023 18:23 UTC

86 points

8 comments8 min readLW link

GPTs are Predictors, not Imitators

Eliezer Yudkowsky8 Apr 2023 19:59 UTC

365 points

88 comments3 min readLW link

ICA Simulacra

Ozyrus5 Apr 2023 6:41 UTC

26 points

2 comments7 min readLW link

Early Results: Do LLMs complete false equations with false equations?

Robert_AIZI30 Mar 2023 20:14 UTC

14 points

0 comments4 min readLW link

(aizi.substack.com)

Remarks 1–18 on GPT (compressed)

Cleo Nardo20 Mar 2023 22:27 UTC

146 points

35 comments31 min readLW link

Super-Luigi = Luigi + (Luigi—Waluigi)

Alexei17 Mar 2023 15:27 UTC

16 points

9 comments1 min readLW link

The algorithm isn’t doing X, it’s just doing Y.

Cleo Nardo16 Mar 2023 23:28 UTC

53 points

43 comments5 min readLW link

Why do we assume there is a “real” shoggoth behind the LLM? Why not masks all the way down?

Robert_AIZI9 Mar 2023 17:28 UTC

61 points

48 comments2 min readLW link

Situational awareness in Large Language Models

Simon Möller3 Mar 2023 18:59 UTC

28 points

2 comments7 min readLW link

The Waluigi Effect (mega-post)

Cleo Nardo3 Mar 2023 3:22 UTC

617 points

188 comments16 min readLW link

Implied “utilities” of simulators are broad, dense, and shallow

porby1 Mar 2023 3:23 UTC

43 points

7 comments3 min readLW link

[Simulators seminar sequence] #2 Semiotic physics—revamped

Jan, Charlie Steiner, Logan Riggs, janus, jacquesthibs, metasemi, Michael Oesterle, Lucas Teixeira, peligrietzer and remember

27 Feb 2023 0:25 UTC

23 points

23 comments13 min readLW link

Agents vs. Predictors: Concrete differentiating factors

evhub24 Feb 2023 23:50 UTC

37 points

3 comments4 min readLW link

Pretraining Language Models with Human Preferences

Tomek Korbak, Sam Bowman and Ethan Perez

21 Feb 2023 17:57 UTC

133 points

18 comments11 min readLW link

You’re not a simulation, ’cause you’re hallucinating

Stuart_Armstrong21 Feb 2023 12:12 UTC

25 points

6 comments1 min readLW link

Instrumentality makes agents agenty

porby21 Feb 2023 4:28 UTC

19 points

4 comments6 min readLW link

Two problems with ‘Simulators’ as a frame

ryan_greenblatt17 Feb 2023 23:34 UTC

81 points

13 comments5 min readLW link

A note on ‘semiotic physics’

metasemi11 Feb 2023 5:12 UTC

11 points

13 comments6 min readLW link

Cyborgism

NicholasKees and janus

10 Feb 2023 14:47 UTC

333 points

45 comments35 min readLW link

Conditioning Predictive Models: Deployment strategy

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

9 Feb 2023 20:59 UTC

28 points

0 comments10 min readLW link

Conditioning Predictive Models: Interactions with other approaches

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

8 Feb 2023 18:19 UTC

32 points

2 comments11 min readLW link

Conditioning Predictive Models: Making inner alignment as easy as possible

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

7 Feb 2023 20:04 UTC

27 points

2 comments19 min readLW link

Conditioning Predictive Models: The case for competitiveness

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

6 Feb 2023 20:08 UTC

20 points

3 comments11 min readLW link

Conditioning Predictive Models: Outer alignment via careful conditioning

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

2 Feb 2023 20:28 UTC

70 points

13 comments57 min readLW link

Conditioning Predictive Models: Large language models as predictors

evhub, Adam Jermyn, Johannes Treutlein, Rubi J. Hudson and kcwoolverton

2 Feb 2023 20:28 UTC

88 points

4 comments13 min readLW link

Inner Misalignment in “Simulator” LLMs

Adam Scherlis31 Jan 2023 8:33 UTC

84 points

11 comments4 min readLW link

Gradient Filtering

Jozdien and janus

18 Jan 2023 20:09 UTC

54 points

16 comments13 min readLW link

Underspecification of Oracle AI

Rubi J. Hudson, Adam Jermyn and Johannes Treutlein

15 Jan 2023 20:10 UTC

30 points

12 comments19 min readLW link

[ASoT] Simulators show us behavioural properties by default

Jozdien13 Jan 2023 18:42 UTC

33 points

2 comments3 min readLW link

[Question] Could Simulating an AGI Taking Over the World Actually Lead to a LLM Taking Over the World?

simeon_c13 Jan 2023 6:33 UTC

15 points

1 comment1 min readLW link

Simulacra are Things

janus8 Jan 2023 23:03 UTC

63 points

7 comments2 min readLW link

Implications of simulators

ThomasW7 Jan 2023 0:37 UTC

17 points

0 comments12 min readLW link

The Limit of Language Models

DragonGod6 Jan 2023 23:53 UTC

43 points

26 comments4 min readLW link

[Simulators seminar sequence] #1 Background & shared assumptions

Jan, Charlie Steiner, Logan Riggs, janus, jacquesthibs, metasemi, Michael Oesterle, Lucas Teixeira, peligrietzer and remember

2 Jan 2023 23:48 UTC

49 points

4 comments3 min readLW link

‘simulator’ framing and confusions about LLMs

Beth Barnes31 Dec 2022 23:38 UTC

104 points

11 comments4 min readLW link

Prosaic misalignment from the Solomonoff Predictor

Cleo Nardo9 Dec 2022 17:53 UTC

40 points

2 comments5 min readLW link

Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy and Megan Kinniment

5 Dec 2022 20:28 UTC

40 points

19 comments10 min readLW link

[ASoT] Finetuning, RL, and GPT’s world prior

Jozdien2 Dec 2022 16:33 UTC

44 points

8 comments5 min readLW link

Simulators, constraints, and goal agnosticism: porbynotes vol. 1

porby23 Nov 2022 4:22 UTC

37 points

2 comments35 min readLW link

When can a mimic surprise you? Why generative models handle seemingly ill-posed problems

David Johnston5 Nov 2022 13:19 UTC

8 points

4 comments16 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC

594 points

161 comments41 min readLW link 8 reviews

(generative.ink)

AGI-level reasoner will appear sooner than an agent; what the humanity will do with this reasoner is critical

Roman Leventov30 Jul 2022 20:56 UTC

24 points

10 comments1 min readLW link

Conditioning Generative Models for Alignment

Jozdien18 Jul 2022 7:11 UTC

58 points

8 comments20 min readLW link

Conditioning Generative Models

Adam Jermyn25 Jun 2022 22:15 UTC

24 points

18 comments10 min readLW link

No comments.