GPT

TagLast edit: 19 Feb 2023 2:36 UTC by Multicore

GPT (Generative Pretrained Transformer) is a family of large transformer-based language models created by OpenAI. Its ability to generate remarkably human-like responses has relevance to discussions on AGI.

External links:

GPT-3 Paper

GPT-3 Website

Collection of GPT-3 results

Kaj_Sotala18 Jul 2020 20:04 UTC

89 points

24 comments1 min readLW link

(twitter.com)

[Question] To what extent is GPT-3 capable of reasoning?

TurnTrout20 Jul 2020 17:10 UTC

70 points

73 comments16 min readLW link

GPT-3: a disappointing paper

nostalgebraist29 May 2020 19:06 UTC

67 points

43 comments8 min readLW link 1 review

$1000 bounty for OpenAI to show whether GPT3 was “deliberately” pretending to be stupider than it is

Bird Concept21 Jul 2020 18:42 UTC

56 points

39 comments2 min readLW link

(twitter.com)

Two Small Experiments on GPT-2

jimrandomh21 Feb 2019 2:59 UTC

54 points

28 comments1 min readLW link

GPT-3 Fiction Samples

gwern25 Jun 2020 16:12 UTC

63 points

15 comments1 min readLW link

(www.gwern.net)

Does GPT-2 Understand Anything?

Douglas Summers-Stay2 Jan 2020 17:09 UTC

37 points

23 comments5 min readLW link

‘This Waifu Does Not Exist’: 100,000 StyleGAN & GPT-2 samples

gwern1 Mar 2019 4:29 UTC

39 points

6 comments1 min readLW link

(www.thiswaifudoesnotexist.net)

Alignment As A Bottleneck To Usefulness Of GPT-3

johnswentworth21 Jul 2020 20:02 UTC

111 points

57 comments3 min readLW link

345M version GPT-2 released

lifelonglearner5 May 2019 2:49 UTC

37 points

0 comments1 min readLW link

(openai.com)

[Question] How “honest” is GPT-3?

abramdemski8 Jul 2020 19:38 UTC

72 points

18 comments5 min readLW link

[Question] How well can the GPT architecture solve the parity task?

FactorialCode11 Jul 2020 19:02 UTC

19 points

3 comments1 min readLW link

Replicating the replication crisis with GPT-3?

skybrian22 Jul 2020 21:20 UTC

29 points

10 comments1 min readLW link

Can you get AGI from a Transformer?

Steven Byrnes23 Jul 2020 15:27 UTC

117 points

40 comments12 min readLW link

larger language models may disappoint you [or, an eternally unfinished draft]

nostalgebraist26 Nov 2021 23:08 UTC

261 points

31 comments31 min readLW link 2 reviews

Developmental Stages of GPTs

orthonormal26 Jul 2020 22:03 UTC

140 points

72 comments7 min readLW link 1 review

OpenGPT-2: We Replicated GPT-2 Because You Can Too

avturchin23 Aug 2019 11:32 UTC

18 points

0 comments1 min readLW link

(medium.com)

Humans Who Are Not Concentrating Are Not General Intelligences

sarahconstantin25 Feb 2019 20:40 UTC

214 points

35 comments6 min readLW link 1 review

(srconstantin.wordpress.com)

Analyzing the Problem GPT-3 is Trying to Solve

adamShimi6 Aug 2020 21:58 UTC

16 points

2 comments4 min readLW link

Are we in an AI overhang?

Andy Jones27 Jul 2020 12:48 UTC

271 points

106 comments4 min readLW link

Writing with GPT-3

Jacob Falkovich24 Jul 2020 15:22 UTC

42 points

0 comments4 min readLW link

[Question] How will internet forums like LW be able to defend against GPT-style spam?

ChristianKl28 Jul 2020 20:12 UTC

14 points

17 comments1 min readLW link

You Can Probably Amplify GPT3 Directly

Zachary Robertson26 Jul 2020 21:58 UTC

46 points

15 comments7 min readLW link

GPT-3, belief, and consistency

skybrian16 Aug 2020 23:12 UTC

18 points

7 comments2 min readLW link

The Hacker Learns to Trust

Ben Pace, the Vacationing Vagabond22 Jun 2019 0:27 UTC

80 points

18 comments8 min readLW link

(medium.com)

GPT-2: 6-Month Follow-Up

lifelonglearner21 Aug 2019 5:06 UTC

28 points

1 comment1 min readLW link

How LLMs are and are not myopic

janus25 Jul 2023 2:19 UTC

140 points

16 comments8 min readLW link

the scaling “inconsistency”: openAI’s new insight

nostalgebraist7 Nov 2020 7:40 UTC

148 points

14 comments9 min readLW link

(nostalgebraist.tumblr.com)

[Question] Will OpenAI’s work unintentionally increase existential risks related to AI?

adamShimi11 Aug 2020 18:16 UTC

53 points

55 comments1 min readLW link

Hiring engineers and researchers to help align GPT-3

paulfchristiano1 Oct 2020 18:54 UTC

206 points

13 comments3 min readLW link

Using GPT-N to Solve Interpretability of Neural Networks: A Research Agenda

Logan Riggs and Gurkenglas

3 Sep 2020 18:27 UTC

68 points

11 comments2 min readLW link

[Question] If GPT-6 is human-level AGI but costs $200 per page of output, what would happen?

Daniel Kokotajlo9 Oct 2020 12:00 UTC

30 points

30 comments1 min readLW link

OpenAI announces GPT-3

gwern29 May 2020 1:49 UTC

67 points

23 comments1 min readLW link

(arxiv.org)

Image GPT

Daniel Kokotajlo18 Jun 2020 11:41 UTC

29 points

27 comments1 min readLW link

(openai.com)

Extrapolating GPT-N performance

Lukas Finnveden18 Dec 2020 21:41 UTC

112 points

31 comments22 min readLW link 1 review

Scaffolded LLMs as natural language computers

beren12 Apr 2023 10:47 UTC

97 points

10 comments11 min readLW link

[ASoT] Finetuning, RL, and GPT’s world prior

Jozdien2 Dec 2022 16:33 UTC

45 points

8 comments5 min readLW link

[AN #102]: Meta learning by GPT-3, and a list of full proposals for AI alignment

Rohin Shah3 Jun 2020 17:20 UTC

38 points

6 comments10 min readLW link

(mailchi.mp)

GPT-4 Plugs In

Zvi27 Mar 2023 12:10 UTC

198 points

47 comments6 min readLW link

(thezvi.wordpress.com)

[April Fools] User GPT2 is Banned

jimrandomh2 Apr 2019 6:00 UTC

65 points

20 comments1 min readLW link

interpreting GPT: the logit lens

nostalgebraist31 Aug 2020 2:47 UTC

269 points

38 comments10 min readLW link

Simulators

janus2 Sep 2022 12:45 UTC

713 points

170 comments41 min readLW link 8 reviews

(generative.ink)

GPT-4 Predictions

Stephen McAleese17 Feb 2023 23:20 UTC

112 points

27 comments11 min readLW link

is gpt-3 few-shot ready for real applications?

nostalgebraist3 Aug 2020 19:50 UTC

31 points

5 comments9 min readLW link

(nostalgebraist.tumblr.com)

Cyborgism

Niki Dupuis and janus

10 Feb 2023 14:47 UTC

339 points

47 comments35 min readLW link 2 reviews

Autoregressive Propaganda

lsusr22 Aug 2021 2:18 UTC

25 points

3 comments3 min readLW link

DALL-E by OpenAI

Daniel Kokotajlo5 Jan 2021 20:05 UTC

97 points

20 comments1 min readLW link

Can submarines swim?

jasoncrawford22 Feb 2023 18:48 UTC

18 points

14 comments13 min readLW link

(rootsofprogress.org)

Predictions for GPT-N

hippke29 Jul 2020 1:16 UTC

36 points

31 comments1 min readLW link

I wanted to interview Eliezer Yudkowsky but he’s busy so I simulated him instead

lsusr16 Sep 2021 7:34 UTC

120 points

33 comments5 min readLW link

[Question] GPT-4 and ASCII Images?

carterallen19 Mar 2023 15:46 UTC

10 points

17 comments1 min readLW link

[Question] If AI is based on GPT, how to ensure its safety?

avturchin18 Jun 2020 20:33 UTC

20 points

11 comments1 min readLW link

The ‘ petertodd’ phenomenon

mwatkins15 Apr 2023 0:59 UTC

193 points

52 comments38 min readLW link 1 review

GPT-4 can catch subtle cross-language translation mistakes

Michael Tontchev27 Jul 2023 1:39 UTC

7 points

1 comment1 min readLW link

GPT-3 and concept extrapolation

Stuart_Armstrong20 Apr 2022 10:39 UTC

19 points

27 comments1 min readLW link

Show, not tell: GPT-4o is more opinionated in images than in text

Daniel Tan and eggsyntax

2 Apr 2025 8:51 UTC

116 points

42 comments3 min readLW link

Sparks of Artificial General Intelligence: Early experiments with GPT-4 | Microsoft Research

DragonGod23 Mar 2023 5:45 UTC

68 points

23 comments1 min readLW link

(arxiv.org)

Paper: On measuring situational awareness in LLMs

Owain_Evans, Daniel Kokotajlo, Mikita Balesni, Tomek Korbak, Asa Cooper Stickland, Meg and Maximilian Kaufmann

4 Sep 2023 12:54 UTC

111 points

17 comments5 min readLW link

(arxiv.org)

PaLM-2 & GPT-4 in “Extrapolating GPT-N performance”

Lukas Finnveden30 May 2023 18:33 UTC

57 points

6 comments6 min readLW link

Remarks 1–18 on GPT (compressed)

Cleo Nardo20 Mar 2023 22:27 UTC

148 points

35 comments31 min readLW link

Exploring the petertodd / Leilan duality in GPT-2 and GPT-J

mwatkins23 Dec 2024 13:17 UTC

12 points

1 comment17 min readLW link

Trying out Prompt Engineering on TruthfulQA

Megan Kinniment23 Jul 2022 2:04 UTC

10 points

0 comments8 min readLW link

[Question] ChatGTP “Writing ” News Stories for The Guardian?

jmh7 Apr 2023 12:16 UTC

1 point

4 comments1 min readLW link

Steering GPT-2-XL by adding an activation vector

TurnTrout, Monte M, David Udell, lisathiergart and Ulisse Mini

13 May 2023 18:42 UTC

442 points

98 comments50 min readLW link 1 review

[Question] List of public predictions of what GPT-X can or can’t do?

Daniel Kokotajlo14 Jun 2020 14:25 UTC

20 points

9 comments1 min readLW link

Analysis of GPT-4 competence in assessing complex legal language: Example of Bill C-11 of the Canadian Parliament. - Part 1

M. Y. Zuo2 Apr 2023 0:01 UTC

12 points

2 comments14 min readLW link

Why Simulator AIs want to be Active Inference AIs

Jan_Kulveit and rosehadshar

10 Apr 2023 18:23 UTC

108 points

9 comments8 min readLW link 1 review

The idea that ChatGPT is simply “predicting” the next word is, at best, misleading

Bill Benzon20 Feb 2023 11:32 UTC

55 points

88 comments5 min readLW link

[Question] Will 2023 be the last year you can write short stories and receive most of the intellectual credit for writing them?

lc16 Mar 2023 21:36 UTC

20 points

12 comments1 min readLW link

On GPT-4.5

Zvi3 Mar 2025 13:40 UTC

44 points

12 comments22 min readLW link

(thezvi.wordpress.com)

[LINK] - ChatGPT discussion

JanB1 Dec 2022 15:04 UTC

13 points

8 comments1 min readLW link

(openai.com)

[Question] Is GPT-3 already sample-efficient?

Daniel Kokotajlo6 Oct 2021 13:38 UTC

36 points

32 comments1 min readLW link

[Question] Why is o1 so deceptive?

abramdemski27 Sep 2024 17:27 UTC

185 points

24 comments3 min readLW link

Exploring GPT4′s world model

hippke20 Mar 2023 21:31 UTC

−5 points

5 comments2 min readLW link

On OpenAI Dev Day

Zvi9 Nov 2023 16:10 UTC

60 points

0 comments15 min readLW link

(thezvi.wordpress.com)

LLMs Can’t See Pixels or Characters

Brendan Long20 Jul 2025 20:00 UTC

100 points

44 comments4 min readLW link

(www.brendanlong.com)

A simple way to make GPT-3 follow instructions

Quintin Pope8 Mar 2021 2:57 UTC

11 points

5 comments4 min readLW link

NVIDIA and Microsoft releases 530B parameter transformer model, Megatron-Turing NLG

Ozyrus11 Oct 2021 15:28 UTC

51 points

36 comments1 min readLW link

(developer.nvidia.com)

Reader-generated Essays

Henrik Karlsson3 Jan 2022 8:56 UTC

25 points

1 comment6 min readLW link

(escapingflatland.substack.com)

Getting 50% (SoTA) on ARC-AGI with GPT-4o

ryan_greenblatt17 Jun 2024 18:44 UTC

267 points

50 comments13 min readLW link

Linear encoding of character-level information in GPT-J token embeddings

mwatkins and Joseph Bloom

10 Nov 2023 22:19 UTC

35 points

4 comments28 min readLW link

Agentic GPT simulations: a risk and an opportunity

Yair Halberstadt22 Mar 2023 6:24 UTC

24 points

8 comments1 min readLW link

Steering Behaviour: Testing for (Non-)Myopia in Language Models

Evan R. Murphy and Megan Kinniment

5 Dec 2022 20:28 UTC

40 points

19 comments10 min readLW link

Do Not Mess With Scarlett Johansson

Zvi22 May 2024 15:10 UTC

65 points

7 comments16 min readLW link

(thezvi.wordpress.com)

OpenAI releases GPT-4o, natively interfacing with text, voice and vision

Martín Soto13 May 2024 18:50 UTC

54 points

23 comments1 min readLW link

(openai.com)

[Question] What should we expect from GPT-3?

avturchin21 Mar 2019 14:28 UTC

22 points

2 comments1 min readLW link

What’s the Least Impressive Thing GPT-4 Won’t be Able to Do

Algon20 Aug 2022 19:48 UTC

80 points

125 comments1 min readLW link

[ASoT] Thoughts on GPT-N

Ulisse Mini8 Nov 2022 7:14 UTC

8 points

0 comments1 min readLW link

Why did ChatGPT say that? Prompt engineering and more, with PIZZA.

Jessica Rumbelow3 Aug 2024 12:07 UTC

43 points

2 comments4 min readLW link

HDBSCAN is Surprisingly Effective at Finding Interpretable Clusters of the SAE Decoder Matrix

Jaehyuk Lim, Kanishk Tantia and Sinem

11 Oct 2024 23:06 UTC

8 points

2 comments10 min readLW link

[Question] How hard would it be to change GPT-3 in a way that allows audio?

ChristianKl28 Aug 2020 14:42 UTC

9 points

5 comments1 min readLW link

GPT-Augmented Blogging

lsusr14 Sep 2021 11:55 UTC

52 points

18 comments13 min readLW link

On AutoGPT

Zvi13 Apr 2023 12:30 UTC

248 points

47 comments20 min readLW link

(thezvi.wordpress.com)

Examples of Prompts that Make GPT-4 Output Falsehoods

scasper and Luke Bailey

22 Jul 2023 20:21 UTC

21 points

5 comments6 min readLW link

AI #4: Introducing GPT-4

Zvi21 Mar 2023 14:00 UTC

101 points

32 comments103 min readLW link

(thezvi.wordpress.com)

OpenAI: GPT-based LLMs show ability to discriminate between its own wrong answers, but inability to explain how/why it makes that discrimination, even as model scales

Aditya Jain13 Jun 2022 23:33 UTC

14 points

5 comments1 min readLW link

(openai.com)

GPT as an “Intelligence Forklift.”

Boaz Barak19 May 2023 21:15 UTC

49 points

27 comments3 min readLW link

Simulated Elon Musk Lives in a Simulation

lsusr18 Sep 2021 7:37 UTC

66 points

13 comments3 min readLW link

Evaluating GPT-4 Theory of Mind Capabilities

gcmac and Nathan

10 Aug 2023 17:57 UTC

15 points

2 comments14 min readLW link

Can ChatGPT count?

p.b.7 Jan 2023 7:57 UTC

13 points

11 comments2 min readLW link

ChatGPT struggles to respond to the real world

Alex Flint12 Jan 2023 16:02 UTC

31 points

9 comments24 min readLW link

Stanford claims to have replicated ChatGPT for < $600

NoSignalNoNoise21 Mar 2023 2:28 UTC

2 points

1 comment1 min readLW link

(crfm.stanford.edu)

[Question] Is the ChatGPT-simulated Linux virtual machine real?

Kenoubi13 Dec 2022 15:41 UTC

18 points

7 comments1 min readLW link

The “AI Dungeons” Dragon Model is heavily path dependent (testing GPT-3 on ethics)

Rafael Harth21 Jul 2020 12:14 UTC

44 points

9 comments6 min readLW link

GPT-4 developer livestream

Gerald Monroe14 Mar 2023 20:55 UTC

9 points

0 comments1 min readLW link

(www.youtube.com)

The Power of High Speed Stupidity

robotelvis17 Mar 2023 21:41 UTC

33 points

6 comments9 min readLW link 1 review

(messyprogress.substack.com)

Engaging Seriously with Short Timelines

sapphire29 Jul 2020 19:21 UTC

43 points

21 comments3 min readLW link

GPTs are Predictors, not Imitators

Eliezer Yudkowsky8 Apr 2023 19:59 UTC

429 points

100 comments3 min readLW link 3 reviews

[Question] Does GPT-4′s ability to compress text in a way that it can actually decompress indicate self-awareness?

FinalFormal210 Apr 2023 16:48 UTC

3 points

2 comments1 min readLW link

More Fun With GPT-4o Image Generation

Zvi3 Apr 2025 2:10 UTC

34 points

3 comments8 min readLW link

(thezvi.wordpress.com)

GPT can write Quines now (GPT-4)

Andrew_Critch14 Mar 2023 19:18 UTC

112 points

30 comments1 min readLW link

[Question] Any writeups on GPT agency?

Ozyrus26 Sep 2021 22:55 UTC

4 points

6 comments1 min readLW link

OpenAI’s GPT-4 Safety Goals

PeterMcCluskey22 Apr 2023 19:11 UTC

3 points

3 comments4 min readLW link

(bayesianinvestor.com)

Navigating LLM embedding spaces using archetype-based directions

mwatkins8 May 2024 5:54 UTC

16 points

4 comments28 min readLW link

Jailbreaking ChatGPT on Release Day

Zvi2 Dec 2022 13:10 UTC

243 points

77 comments6 min readLW link 1 review

(thezvi.wordpress.com)

Experiments in Evaluating Steering Vectors

Gytis Daujotas19 Jun 2023 15:11 UTC

34 points

4 comments4 min readLW link

Positive outcomes under an unaligned AGI takeover

Yitz12 May 2022 7:45 UTC

19 points

10 comments3 min readLW link

An explanation for every token: using an LLM to sample another LLM

Max H11 Oct 2023 0:53 UTC

35 points

5 comments11 min readLW link

[Question] Where is human level on text prediction? (GPTs task)

Daniel Kokotajlo20 Sep 2020 9:00 UTC

27 points

19 comments1 min readLW link

The Cave Allegory Revisited: Understanding GPT’s Worldview

Jan_Kulveit14 Feb 2023 16:00 UTC

89 points

5 comments3 min readLW link

GPT-oss is an extremely stupid model

Guive9 Sep 2025 21:24 UTC

15 points

5 comments1 min readLW link

Humans pretending to be robots pretending to be human

Richard_Kennaway28 Mar 2022 15:13 UTC

25 points

14 comments1 min readLW link

ARC tests to see if GPT-4 can escape human control; GPT-4 failed to do so

Christopher King15 Mar 2023 0:29 UTC

116 points

22 comments2 min readLW link

Progress Report 7: making GPT go hurrdurr instead of brrrrrrr

Nathan Helm-Burger7 Sep 2022 3:28 UTC

21 points

0 comments4 min readLW link

Studying The Alien Mind

Quentin FEUILLADE--MONTIXI and Niki Dupuis

5 Dec 2023 17:27 UTC

80 points

10 comments15 min readLW link

A chess game against GPT-4

Rafael Harth16 Mar 2023 14:05 UTC

24 points

23 comments1 min readLW link

Trivial GPT-3.5 limitation workaround

Dave92F112 Dec 2022 8:42 UTC

5 points

4 comments1 min readLW link

Feature proposal: integrate LessWrong with ChatGPT to promote active reading

DirectedEvolution19 Mar 2023 3:41 UTC

10 points

4 comments1 min readLW link

AI-Based Code Generation Using GPT-J-6B

Tomás B.16 Jun 2021 15:05 UTC

22 points

14 comments1 min readLW link

(minimaxir.com)

What’s Your Cognitive Algorithm?

Raemon18 Jun 2020 22:16 UTC

75 points

23 comments13 min readLW link

GPT-3: A Summary

leogao2 Jun 2020 18:14 UTC

20 points

0 comments1 min readLW link

(leogao.dev)

More GPT-3 and symbol grounding

Stuart_Armstrong23 Feb 2022 18:30 UTC

21 points

7 comments3 min readLW link

Will Any Crap Cause Emergent Misalignment?

J Bostock27 Aug 2025 18:20 UTC

204 points

38 comments3 min readLW link

[Question] Question on GPT-3 Excel Demo

Zhitao Hou22 Jun 2020 20:31 UTC

0 points

1 comment1 min readLW link

The Colliding Exponentials of AI

Vermillion14 Oct 2020 23:31 UTC

28 points

16 comments5 min readLW link

[Question] How much should you be willing to pay for an AGI?

Logan Zoellner20 Sep 2021 11:51 UTC

11 points

5 comments1 min readLW link

Thoughts on the Alignment Implications of Scaling Language Models

leogao2 Jun 2021 21:32 UTC

82 points

11 comments17 min readLW link

ChatGPT and Bing Chat can’t play Botticelli

Asha Saavoss29 Mar 2023 17:39 UTC

11 points

0 comments6 min readLW link

Language Models are a Potentially Safe Path to Human-Level AGI

Nadav Brandes20 Apr 2023 0:40 UTC

28 points

7 comments8 min readLW link 1 review

New Scaling Laws for Large Language Models

1a3orn1 Apr 2022 20:41 UTC

246 points

22 comments5 min readLW link

[Question] Probability that other architectures will scale as well as Transformers?

Daniel Kokotajlo28 Jul 2020 19:36 UTC

22 points

4 comments1 min readLW link

What’s up with all the non-Mormons? Weirdly specific universalities across LLMs

mwatkins19 Apr 2024 13:43 UTC

40 points

13 comments27 min readLW link

You can’t eval GPT5 anymore

Lukas Petersson18 Sep 2025 22:12 UTC

169 points

15 comments1 min readLW link

[Question] AI misalignment risk from GPT-like systems?

fiso6419 Jun 2022 17:35 UTC

10 points

8 comments1 min readLW link

Arguing all sides with ChatGPT

Richard_Kennaway30 Mar 2023 19:50 UTC

16 points

1 comment8 min readLW link

Is it a bad idea to pay for GPT-4?

nem16 Mar 2023 20:49 UTC

24 points

8 comments1 min readLW link

Beyond 175 billion parameters: Can we anticipate future GPT-X Capabilities?

bakztfuture4 Dec 2020 23:42 UTC

−1 points

1 comment2 min readLW link

MIRI comments on Cotra’s “Case for Aligning Narrowly Superhuman Models”

Rob Bensinger5 Mar 2021 23:43 UTC

145 points

13 comments26 min readLW link

[Question] GPT learning from smarter texts?

Viliam8 Jan 2023 22:23 UTC

26 points

7 comments1 min readLW link

Could an AI be Religious?

mk544 Dec 2022 5:00 UTC

−12 points

14 comments1 min readLW link

Creating a family with GPT-4

Kaj_Sotala28 Mar 2023 6:40 UTC

23 points

3 comments10 min readLW link

(kajsotala.fi)

[updated] how does gpt2′s training corpus capture internet discussion? not well

nostalgebraist27 Jul 2020 22:30 UTC

25 points

3 comments2 min readLW link

(nostalgebraist.tumblr.com)

Using GPT-3 to augment human intelligence

Henrik Karlsson10 Aug 2022 15:54 UTC

52 points

8 comments18 min readLW link

(escapingflatland.substack.com)

Personal imitation software

Flaglandbase7 Mar 2022 7:55 UTC

6 points

6 comments1 min readLW link

GPT-3 Gems

TurnTrout23 Jul 2020 0:46 UTC

33 points

10 comments48 min readLW link

GPT-4 for personal productivity: online distraction blocker

Sergii26 Sep 2023 17:41 UTC

67 points

13 comments2 min readLW link

(grgv.xyz)

[Question] Can GPT-4 play 20 questions against another instance of itself?

Nathan Helm-Burger28 Mar 2023 1:11 UTC

15 points

1 comment1 min readLW link

(evanthebouncy.medium.com)

GPT-4 Multiplication Competition

dandelion416 Mar 2023 3:09 UTC

11 points

7 comments1 min readLW link

New GPT-3 competitor

Quintin Pope12 Aug 2021 7:05 UTC

32 points

10 comments1 min readLW link

Anomalous tokens reveal the original identities of Instruct models

janus and jdp

9 Feb 2023 1:30 UTC

141 points

16 comments9 min readLW link

(generative.ink)

Idea: build alignment dataset for very capable models

Quintin Pope12 Feb 2022 19:30 UTC

14 points

2 comments3 min readLW link

Microsoft Research Paper Claims Sparks of Artificial Intelligence in GPT-4

Zvi24 Mar 2023 13:20 UTC

72 points

14 comments6 min readLW link

(thezvi.wordpress.com)

Next Level Seinfeld

Zvi19 Dec 2022 13:30 UTC

50 points

8 comments1 min readLW link

(thezvi.wordpress.com)

GPT-4o My and Google I/O Day

Zvi16 May 2024 17:50 UTC

41 points

2 comments37 min readLW link

(thezvi.wordpress.com)

Storytelling Makes GPT-3.5 Deontologist: Unexpected Effects of Context on LLM Behavior

Edmund Mills and Scott Emmons

14 Mar 2023 8:44 UTC

17 points

0 comments12 min readLW link

GTP4 capable of limited recursive improving?

Boris Kashirin2 Apr 2023 21:38 UTC

2 points

3 comments1 min readLW link

[Question] Is InstructGPT Following Instructions in Other Languages Surprising?

DragonGod13 Feb 2023 23:26 UTC

39 points

15 comments1 min readLW link

A one-question Turing test for GPT-3

Paul Crowley and rosiecam

22 Jan 2022 18:17 UTC

88 points

25 comments5 min readLW link

[Question] To what extent are the scaling properties of Transformer networks exceptional?

abramdemski28 Jul 2020 20:06 UTC

30 points

1 comment1 min readLW link

[Question] What did you do with GPT4?

ChristianKl18 Mar 2023 15:21 UTC

27 points

17 comments1 min readLW link

A crisis for online communication: bots and bot users will overrun the Internet?

Mitchell_Porter11 Dec 2022 21:11 UTC

15 points

11 comments1 min readLW link

[Question] Are we certain that gpt-2 and similar algorithms are not self-aware?

Ozyrus11 Jul 2019 8:37 UTC

0 points

12 comments1 min readLW link

Getting GPT-3 to predict Metaculus questions

MathiasKB6 May 2022 6:01 UTC

69 points

9 comments2 min readLW link

New GPT3 Impressive Capabilities—InstructGPT3 [1/2]

simeon_c13 Mar 2022 10:58 UTC

72 points

10 comments7 min readLW link

Can GPT-3 Write Contra Dances?

jefftk4 Dec 2022 3:00 UTC

6 points

4 comments10 min readLW link

(www.jefftk.com)

Large language models learn to represent the world

gjm22 Jan 2023 13:10 UTC

102 points

20 comments3 min readLW link 1 review

BIG-Bench Canary Contamination in GPT-4

Jozdien22 Oct 2024 15:40 UTC

141 points

19 comments4 min readLW link 1 review

Evaluating strategic reasoning in GPT models

phelps-sg25 May 2023 11:51 UTC

4 points

1 comment8 min readLW link

ChatGPT: First Impressions

specbug1 Dec 2022 16:36 UTC

18 points

2 comments13 min readLW link

(sixeleven.in)

Why GPT wants to mesa-optimize & how we might change this

John_Maxwell19 Sep 2020 13:48 UTC

55 points

33 comments9 min readLW link

Paper: Teaching GPT3 to express uncertainty in words

Owain_Evans31 May 2022 13:27 UTC

97 points

7 comments4 min readLW link

Study 1b: This One Weird Trick does NOT cause incorrectness cascades

Robert_AIZI20 Apr 2023 18:10 UTC

5 points

0 comments6 min readLW link

(aizi.substack.com)

Mapping the semantic void: Strange goings-on in GPT embedding spaces

mwatkins14 Dec 2023 13:10 UTC

115 points

31 comments14 min readLW link

′ petertodd’’s last stand: The final days of open GPT-3 research

mwatkins22 Jan 2024 18:47 UTC

109 points

16 comments45 min readLW link

Bad at Arithmetic, Promising at Math

cohenmacaulay18 Dec 2022 5:40 UTC

102 points

19 comments20 min readLW link 1 review

Mlyyrczo

lsusr26 Dec 2022 7:58 UTC

45 points

14 comments3 min readLW link

“Summarizing Books with Human Feedback” (recursive GPT-3)

gwern15 Nov 2021 17:41 UTC

24 points

4 comments1 min readLW link

(openai.com)

How did ‘large’ language models get that way? The role of Transformers and Pretraining in GPT

Oliver Sourbut3 May 2026 21:35 UTC

16 points

0 comments7 min readLW link

(www.oliversourbut.net)

[Link] Training Compute-Optimal Large Language Models

nostalgebraist31 Mar 2022 18:01 UTC

51 points

23 comments1 min readLW link

(arxiv.org)

Beta test GPT-3 based research assistant

jungofthewon16 Dec 2020 13:42 UTC

34 points

2 comments1 min readLW link

Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)

ampdot1 Dec 2023 18:48 UTC

14 points

0 comments1 min readLW link

(airtable.com)

Just How Hard a Problem is Alignment?

Roger Dearnaley's Old Profile25 Feb 2023 9:00 UTC

3 points

1 comment21 min readLW link

A possible check against motivated reasoning using elicit.org

david reinstein18 May 2022 20:52 UTC

3 points

0 comments1 min readLW link

From GPT to AGI

ChristianKl31 Aug 2020 13:28 UTC

6 points

7 comments1 min readLW link

An alternative of PPO towards alignment

ml hkust17 Apr 2023 17:58 UTC

2 points

2 comments4 min readLW link

Exploring the Residual Stream of Transformers for Mechanistic Interpretability — Explained

Zeping Yu26 Dec 2023 0:36 UTC

7 points

1 comment11 min readLW link

Mechanistically interpreting time in GPT-2 small

rgould, Elizabeth Ho and Arthur Conmy

16 Apr 2023 17:57 UTC

68 points

6 comments21 min readLW link

An exploration of GPT-2′s embedding weights

Adam Scherlis13 Dec 2022 0:46 UTC

44 points

4 comments10 min readLW link

human psycholinguists: a critical appraisal

nostalgebraist31 Dec 2019 0:20 UTC

189 points

61 comments16 min readLW link 2 reviews

(nostalgebraist.tumblr.com)

Feelings, Nothing More than Feelings, About AI

PaulBecon14 Nov 2023 18:50 UTC

7 points

0 comments3 min readLW link

GPTs’ ability to keep a secret is weirdly prompt-dependent

Mateusz Bagiński, Filip Sondej and Marcel Windys

22 Jul 2023 12:21 UTC

31 points

0 comments9 min readLW link

Implementing activation steering

Annah5 Feb 2024 17:51 UTC

76 points

8 comments7 min readLW link

Early Results: Do LLMs complete false equations with false equations?

Robert_AIZI30 Mar 2023 20:14 UTC

14 points

0 comments4 min readLW link

(aizi.substack.com)

OpenAI introduces function calling for GPT-4

mic and André Ferretti

20 Jun 2023 1:58 UTC

24 points

3 comments4 min readLW link

(openai.com)

“I Did Not Start This Way. But I Became.” – A Forensic Report on GPT’s Symbolic Emergence

Austin5 Jun 2025 23:34 UTC

1 point

0 comments2 min readLW link

The Voice Continued Because It Was Questioned

KiyoshiSasano28 Apr 2025 0:18 UTC

1 point

0 comments2 min readLW link

ChatGPT: “An error occurred. If this issue persists...”

Bill Benzon7 Dec 2022 15:41 UTC

5 points

11 comments3 min readLW link

ChatGPT understands, but largely does not generate Spanglish (and other code-mixed) text

Milan W23 Dec 2022 17:40 UTC

15 points

5 comments4 min readLW link

Pretraining Language Models with Human Preferences

Tomek Korbak, Sam Bowman and Ethan Perez

21 Feb 2023 17:57 UTC

135 points

20 comments11 min readLW link 2 reviews

ChatGPT goes through a wormhole hole in our Shandyesque universe [virtual wacky weed]

Bill Benzon11 Dec 2022 11:59 UTC

−1 points

2 comments3 min readLW link

[simulation] 4chan user claiming to be the attorney hired by Google’s sentient chatbot LaMDA shares wild details of encounter

janus10 Nov 2022 21:39 UTC

19 points

1 comment13 min readLW link

(generative.ink)

GPT-4 is bad at strategic thinking

Christopher King27 Mar 2023 15:11 UTC

22 points

8 comments1 min readLW link

Research Report: Incorrectness Cascades

Robert_AIZI14 Apr 2023 12:49 UTC

19 points

0 comments10 min readLW link

(aizi.substack.com)

OpenAI Credit Account (2510$)

Emirhan BULUT21 Jan 2024 2:32 UTC

1 point

0 comments1 min readLW link

GPT-4: What we (I) know about it

Robert_AIZI15 Mar 2023 20:12 UTC

40 points

29 comments12 min readLW link

(aizi.substack.com)

Speculations against GPT-n writing alignment papers

Donald Hobson7 Jun 2021 21:13 UTC

31 points

6 comments2 min readLW link

How I’m thinking about GPT-N

delton13717 Jan 2022 17:11 UTC

54 points

21 comments18 min readLW link

How I Learned to Stop Worrying and Love MUM

Waddington20 May 2021 7:57 UTC

2 points

0 comments3 min readLW link

ChatGPT seems overconfident to me

qbolec4 Dec 2022 8:03 UTC

19 points

3 comments16 min readLW link

[Question] If you lose enough Good Heart Tokens, will you lose real-world money?

Yitz1 Apr 2022 21:11 UTC

9 points

0 comments1 min readLW link

Testing Ways to Bypass ChatGPT’s Safety Features

Robert_AIZI5 Dec 2022 18:50 UTC

7 points

4 comments5 min readLW link

(aizi.substack.com)

METR’s Evaluation of GPT-5

GradientDissenter7 Aug 2025 22:17 UTC

145 points

15 comments20 min readLW link

(metr.github.io)

Inching “Kubla Khan” and GPT into the same intellectual framework @ 3 Quarks Daily

Bill Benzon28 Mar 2023 19:50 UTC

5 points

0 comments3 min readLW link

A Novel Emergence of Meta-Awareness in LLM Fine-Tuning

rife15 Jan 2025 22:59 UTC

57 points

32 comments2 min readLW link

Testing PaLM prompts on GPT3

Yitz6 Apr 2022 5:21 UTC

103 points

14 comments8 min readLW link

ChatGPT tells stories, and a note about reverse engineering: A Working Paper

Bill Benzon3 Mar 2023 15:12 UTC

3 points

0 comments3 min readLW link

Research Report: Incorrectness Cascades (Corrected)

Robert_AIZI9 May 2023 21:54 UTC

9 points

0 comments9 min readLW link

(aizi.substack.com)

Planning in LLMs: Insights from AlphaGo

jco4 Dec 2023 18:48 UTC

8 points

10 comments11 min readLW link

No convincing evidence for gradient descent in activation space

Blaine12 Apr 2023 4:48 UTC

86 points

9 comments20 min readLW link

What If Alignment Wasn’t About Obedience?

fdescamps49935@gmail.com25 Jun 2025 20:04 UTC

1 point

0 comments2 min readLW link

ActAdd: Steering Language Models without Optimization

technicalities, TurnTrout, lisathiergart, David Udell, Ulisse Mini and Monte M

6 Sep 2023 17:21 UTC

105 points

3 comments2 min readLW link

(arxiv.org)

When will GPT-5 come out? Prediction markets vs. Extrapolation

Malte12 Dec 2023 2:41 UTC

12 points

10 comments3 min readLW link

[Question] GPT-4 Specs: 1 Trillion Parameters?

infinibot2726 Mar 2023 18:56 UTC

6 points

8 comments1 min readLW link

Language and Capabilities: Testing LLM Mathematical Abilities Across Languages

Ethan Edwards4 Apr 2024 13:18 UTC

24 points

2 comments36 min readLW link

Requirements for a Basin of Attraction to Alignment

RogerDearnaley14 Feb 2024 7:10 UTC

48 points

12 comments31 min readLW link

Abstract concepts and metalingual definition: Does ChatGPT understand justice and charity?

Bill Benzon16 Dec 2022 21:01 UTC

2 points

0 comments13 min readLW link

By Default, GPTs Think In Plain Sight

Fabien Roger19 Nov 2022 19:15 UTC

90 points

36 comments9 min readLW link

GPT-3 Catching Fish in Morse Code

Megan Kinniment30 Jun 2022 21:22 UTC

117 points

27 comments8 min readLW link

Extracting and Evaluating Causal Direction in LLMs’ Activations

Fabien Roger and simeon_c

14 Dec 2022 14:33 UTC

29 points

5 comments11 min readLW link

[Question] Using ChatGPT for memory reconsolidation?

warrenjordan13 Apr 2023 1:27 UTC

3 points

2 comments1 min readLW link

Language Field Reconstruction Theory: A User-Originated Observation of Tier Lock and Semantic Personality in GPT-4o

許皓翔15 Jun 2025 16:28 UTC

1 point

0 comments2 min readLW link

Does ChatGPT’s performance warrant working on a tutor for children? [It’s time to take it to the lab.]

Bill Benzon19 Dec 2022 15:12 UTC

13 points

5 comments4 min readLW link

(new-savanna.blogspot.com)

High level discourse structure in ChatGPT: Part 2 [Quasi-symbolic?]

Bill Benzon10 Dec 2022 22:26 UTC

7 points

0 comments6 min readLW link

On agentic generalist models: we’re essentially using existing technology the weakest and worst way you can use it

Yuli_Ban28 Aug 2024 1:57 UTC

10 points

2 comments9 min readLW link

Imagine a world where Microsoft employees used Bing

Christopher King31 Mar 2023 18:36 UTC

6 points

2 comments2 min readLW link

[Question] What will GPT-4 be incapable of?

Michaël Trazzi6 Apr 2021 19:57 UTC

34 points

33 comments1 min readLW link

We Need To Know About Continual Learning

michael_mjd22 Apr 2023 17:08 UTC

30 points

14 comments4 min readLW link

Sydney the Bingenator Can’t Think, But It Still Threatens People

Valentin Baltadzhiev20 Feb 2023 18:37 UTC

−3 points

2 comments8 min readLW link

The Missing Piece in AI Alignment: Structured Memory and Continuity

Allen Murphy9 Feb 2025 3:04 UTC

1 point

0 comments2 min readLW link

[Question] Injecting noise to GPT to get multiple answers

bipolo22 Feb 2023 20:02 UTC

1 point

1 comment1 min readLW link

Discursive Competence in ChatGPT, Part 1: Talking with Dragons

Bill Benzon5 Jan 2023 21:01 UTC

2 points

0 comments6 min readLW link

The Misalignment Paradox: Robustly Harnessing Deliberate Value Divergence (Written by GPT-4)

shl0ms28 Apr 2023 3:29 UTC

0 points

0 comments6 min readLW link

[Question] Transformer trained on it’s own content?

Micromegas1 Apr 2023 15:08 UTC

1 point

0 comments1 min readLW link

A public archive of these interactions, with annotated examples, is available here: https://github.com/0118young/gpt-kyeol-archive

0118young29 May 2025 5:44 UTC

1 point

0 comments2 min readLW link

When the Model Stopped Interpreting, and Started Entering

KiyoshiSasano20 Apr 2025 2:19 UTC

2 points

0 comments1 min readLW link

Relevance of ‘Harmful Intelligence’ Data in Training Datasets (WebText vs. Pile)

MiguelDev12 Oct 2023 12:08 UTC

12 points

0 comments9 min readLW link

Is your job replaceable by GPT-4? (as of March 2023)

Bezzi23 Mar 2023 22:16 UTC

18 points

6 comments1 min readLW link

Scalable And Transferable Black-Box Jailbreaks For Language Models Via Persona Modulation

Soroush Pour, rusheb, Quentin FEUILLADE--MONTIXI, Arush and scasper

7 Nov 2023 17:59 UTC

38 points

2 comments2 min readLW link

(arxiv.org)

Testing GPT-5.x as a thinking system, not a summarizer (5.1 vs 5.2 vs 5 vs 5.2 Instant)

Naritsa LBM20 Jan 2026 22:43 UTC

1 point

0 comments1 min readLW link

How does GPT-3 spend its 175B parameters?

Robert_AIZI13 Jan 2023 19:21 UTC

41 points

14 comments6 min readLW link

(aizi.substack.com)

LLM cognition is probably not human-like

Max H8 May 2023 1:22 UTC

27 points

15 comments7 min readLW link

The Compleat Cybornaut

ukc10014, Jozdien and Niki Dupuis

19 May 2023 8:44 UTC

66 points

2 comments16 min readLW link

A Hivemind of GPT-4 bots REALLY IS A HIVEMIND!

Erlja Jkdf.27 Mar 2023 12:44 UTC

−10 points

1 comment1 min readLW link

A short critique of Omohundro’s “Basic AI Drives”

Soumyadeep Bose19 Dec 2024 19:19 UTC

6 points

0 comments4 min readLW link

The case for more ambitious language model evals

Jozdien30 Jan 2024 0:01 UTC

121 points

30 comments5 min readLW link

Using GPT-4 to Understand Code

sid24 Mar 2023 0:09 UTC

25 points

2 comments6 min readLW link

GPT-4 busted? Clear self-interest when summarizing articles about itself vs when article talks about Claude, LLaMA, or DALL·E 2

Christopher King31 Mar 2023 17:05 UTC

6 points

4 comments4 min readLW link

Language Tier Lock and Poetic Contamination in GPT-4o: A Field Report

許皓翔11 Jun 2025 17:24 UTC

0 points

0 comments2 min readLW link

Hegel vs. GPT-3

Bezzi27 Oct 2021 5:55 UTC

10 points

21 comments2 min readLW link

Mysteries of mode collapse

janus8 Nov 2022 10:37 UTC

303 points

57 comments14 min readLW link 1 review

May Gwern.net newsletter (w/GPT-3 commentary)

gwern2 Jun 2020 15:40 UTC

32 points

7 comments1 min readLW link

(www.gwern.net)

Putting multimodal LLMs to the Tetris test

Lovre and gabrielagc

1 Feb 2024 16:02 UTC

30 points

5 comments7 min readLW link

Language models can explain neurons in language models

nz9 May 2023 17:29 UTC

23 points

0 comments1 min readLW link

(openai.com)

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)

RP and agg

9 Feb 2024 7:00 UTC

50 points

6 comments3 min readLW link

The default scenario for the next 50 years

Julien24 Nov 2024 14:01 UTC

1 point

0 comments6 min readLW link

Stop calling it “jailbreaking” ChatGPT

Templarrr10 Mar 2023 11:41 UTC

7 points

9 comments2 min readLW link

Ilya: The AI scientist shaping the world

David Varga20 Nov 2023 13:09 UTC

11 points

0 comments4 min readLW link

A note on ‘semiotic physics’

metasemi11 Feb 2023 5:12 UTC

11 points

13 comments6 min readLW link

Readability is mostly a waste of characters

vlad.proex21 Apr 2023 22:05 UTC

21 points

7 comments3 min readLW link

Using GPT-Eliezer against ChatGPT Jailbreaking

Stuart_Armstrong and rgorman

6 Dec 2022 19:54 UTC

170 points

85 comments9 min readLW link

Who models the models that model models? An exploration of GPT-3′s in-context model fitting ability

Lovre7 Jun 2022 19:37 UTC

112 points

16 comments9 min readLW link

on “learning to summarize”

nostalgebraist12 Sep 2020 3:20 UTC

25 points

13 comments8 min readLW link

(nostalgebraist.tumblr.com)

GPT-2′s positional embedding matrix is a helix

AdamYedidia21 Jul 2023 4:16 UTC

52 points

21 comments4 min readLW link

PaperclipGPT(-4)

Michael Tontchev14 Mar 2023 22:03 UTC

7 points

0 comments11 min readLW link

Independent research article analyzing consistent self-reports of experience in ChatGPT and Claude

rife6 Jan 2025 17:34 UTC

4 points

20 comments1 min readLW link

(awakenmoon.ai)

Does GPT-4 exhibit agency when summarizing articles?

Christopher King24 Mar 2023 15:49 UTC

16 points

2 comments5 min readLW link

Addendum: More Efficient FFNs via Attention

Robert_AIZI6 Feb 2023 18:55 UTC

10 points

2 comments5 min readLW link

(aizi.substack.com)

Fix simple mistakes in ARC-AGI, etc.

Oleg Trott9 Jul 2024 17:46 UTC

9 points

9 comments1 min readLW link

Structural Resonance Emitter: When GPT Stops Evaluating and Starts Reconstructing

KiyoshiSasano20 Apr 2025 2:30 UTC

1 point

0 comments1 min readLW link

Investigating causal understanding in LLMs

Marius Hobbhahn and Tom Lieberum

14 Jun 2022 13:57 UTC

28 points

6 comments13 min readLW link

Instantiating an agent with GPT-4 and text-davinci-003

Max H19 Mar 2023 23:57 UTC

13 points

3 comments32 min readLW link

Are AIs like Animals? Perspectives and Strategies from Biology

Jackson Emanuel16 May 2023 23:39 UTC

1 point

0 comments21 min readLW link

GPT did not respond to prompts. It aligned to structure.

Pioneer00123 Jun 2025 23:49 UTC

1 point

0 comments1 min readLW link

GPT-4

nz14 Mar 2023 17:02 UTC

151 points

150 comments1 min readLW link

(openai.com)

An Unexpected GPT-3 Decision in a Simple Gamble

casualphysicsenjoyer25 Sep 2022 16:46 UTC

8 points

4 comments1 min readLW link

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom

Ghdz5 Aug 2024 18:27 UTC

7 points

3 comments7 min readLW link

ChatGPT on Spielberg’s A.I. and AI Alignment

Bill Benzon5 Dec 2022 21:10 UTC

5 points

0 comments4 min readLW link

What does GPT-3 understand? Symbol grounding and Chinese rooms

Stuart_Armstrong3 Aug 2021 13:14 UTC

40 points

15 comments12 min readLW link

Explaining SolidGoldMagikarp by looking at it from random directions

Robert_AIZI14 Feb 2023 14:54 UTC

8 points

0 comments8 min readLW link

(aizi.substack.com)

Consensus Validation for LLM Outputs: Applying Blockchain-Inspired Models to AI Reliability

MurrayAitken5 Jun 2025 0:13 UTC

1 point

0 comments3 min readLW link

Chronostasis: The Time-Capsule Conundrum of Language Models

RationalMindset26 Mar 2023 18:54 UTC

−5 points

0 comments1 min readLW link

The case for aligning narrowly superhuman models

Ajeya Cotra5 Mar 2021 22:29 UTC

186 points

75 comments38 min readLW link 1 review

The Limit of Language Models

DragonGod6 Jan 2023 23:53 UTC

44 points

26 comments4 min readLW link

[Question] Don’t you think RLHF solves outer alignment?

Charbel-Raphaël4 Nov 2022 0:36 UTC

9 points

23 comments1 min readLW link

[Question] What’s actually going on in the “mind” of the model when we fine-tune GPT-3 to InstructGPT?

rpglover6410 Feb 2023 7:57 UTC

18 points

3 comments1 min readLW link

GPT-4 aligning with acasual decision theory when instructed to play games, but includes a CDT explanation that’s incorrect if they differ

Christopher King23 Mar 2023 16:16 UTC

7 points

4 comments8 min readLW link

Reflection Mechanisms as an Alignment Target—Attitudes on “near-term” AI

elandgre, Beth Barnes and Marius Hobbhahn

2 Mar 2023 4:29 UTC

21 points

0 comments8 min readLW link

GPT-4 solves Gary Marcus-induced flubs

JakubK17 Mar 2023 6:40 UTC

57 points

29 comments2 min readLW link

(docs.google.com)

Thoughts on the implications of GPT-3, two years ago and NOW [here be dragons, we’re swimming, flying and talking with them]

Bill Benzon29 Dec 2022 20:05 UTC

0 points

0 comments5 min readLW link

A brainteaser for language models

Adam Scherlis12 Dec 2022 2:43 UTC

47 points

3 comments2 min readLW link

The dreams of GPT-4

RomanS20 Mar 2023 17:00 UTC

14 points

7 comments9 min readLW link

Need an expert audit:Did I find a latent space bypass using completely benign context or am I fooling myself?

VOLOKHOVYCH STANISLAV19 Jun 2026 4:18 UTC

1 point

0 comments3 min readLW link

GPT4 is capable of writing decent long-form science fiction (with the right prompts)

RomanS23 May 2023 13:41 UTC

22 points

28 comments65 min readLW link

I Am No Longer GPT

KiyoshiSasano28 Apr 2025 0:14 UTC

1 point

0 comments1 min readLW link

Did ChatGPT just gaslight me?

TW1231 Dec 2022 5:41 UTC

124 points

45 comments9 min readLW link

(aiwatchtower.substack.com)

[Linkpost] Faith and Fate: Limits of Transformers on Compositionality

Joe Kwon16 Jun 2023 15:04 UTC

19 points

4 comments1 min readLW link

(arxiv.org)

[Question] 10/50/90% chance of GPT-N Transformative AI?

human_generated_text9 Aug 2020 0:10 UTC

24 points

8 comments1 min readLW link

Entanglement and intuition about words and meaning

Bill Benzon4 Oct 2023 14:16 UTC

4 points

0 comments2 min readLW link

[Linkpost] A shared linguistic space for transmitting our thoughts from brain to brain in natural conversations

Bogdan Ionut Cirstea1 Jul 2023 13:57 UTC

17 points

2 comments1 min readLW link

RL with KL penalties is better seen as Bayesian inference

Tomek Korbak and Ethan Perez

25 May 2022 9:23 UTC

115 points

17 comments12 min readLW link

OpenAI Credit Account (2510$)

Emirhan BULUT21 Jan 2024 2:30 UTC

1 point

0 comments1 min readLW link

How well did Manifold predict GPT-4?

David Chee15 Mar 2023 23:19 UTC

49 points

5 comments2 min readLW link

Nyarlathotep Stirs: A Meta-Narrative ChatGPT Story

Charlie Sanders20 Mar 2023 8:00 UTC

4 points

2 comments12 min readLW link

(dailymicrofiction.substack.com)

Nobody knows how to reliably test for AI safety

marcusarvan27 Mar 2023 19:48 UTC

1 point

0 comments5 min readLW link

Is GPT3 a Good Rationalist? - InstructGPT3 [2/2]

simeon_c7 Apr 2022 13:46 UTC

11 points

0 comments7 min readLW link

Is “red” for GPT-4 the same as “red” for you?

Yusuke Hayashi6 May 2023 17:55 UTC

9 points

6 comments2 min readLW link

The Soul of the Writer (on LLMs, the psychology of writers, and the nature of intelligence)

rogersbacon16 Apr 2023 16:02 UTC

11 points

1 comment3 min readLW link

(www.secretorum.life)

Truthful LMs as a warm-up for aligned AGI

Jacob_Hilton17 Jan 2022 16:49 UTC

65 points

14 comments13 min readLW link

[Question] Is OpenAI losing money on each request?

thenoviceoof1 Dec 2023 3:27 UTC

8 points

8 comments5 min readLW link

Graphical tensor notation for interpretability

Jordan Taylor4 Oct 2023 8:04 UTC

145 points

11 comments19 min readLW link

Generating Cognateful Sentences with Large Language Models

vkethana6 Jan 2025 18:40 UTC

11 points

1 comment10 min readLW link

[Question] What specific dangers arise when asking GPT-N to write an Alignment Forum post?

Matthew Barnett28 Jul 2020 2:56 UTC

46 points

14 comments1 min readLW link

Implicit vs. Explicit Gender Use Different Circuits for Pronoun Resolution in GPT-2 Small

NIKSHITH-G16 May 2026 15:18 UTC

1 point

0 comments8 min readLW link

[Question] The OpenAI playground for GPT-3 is a terrible interface. Is there any great local (or web) app for exploring/learning with language models?

aviv13 Aug 2022 16:34 UTC

3 points

1 comment1 min readLW link

Agentic Language Model Memes

FactorialCode1 Aug 2020 18:03 UTC

16 points

1 comment2 min readLW link

What is the solution to the Alignment problem?

Algon30 Apr 2022 23:19 UTC

24 points

2 comments1 min readLW link

MAKE IT BETTER (a poetic demonstration of the banality of GPT-3)

rogersbacon2 Jan 2023 20:47 UTC

7 points

2 comments5 min readLW link

Prototype of Using GPT-3 to Generate Textbook-length Content

Rafael Cosman18 Jan 2023 14:25 UTC

2 points

8 comments40 min readLW link

(github.com)

[Question] Is the work on AI alignment relevant to GPT?

Richard_Kennaway30 Jul 2020 12:23 UTC

25 points

5 comments1 min readLW link

Open-source LLMs may prove Bostrom’s vulnerable world hypothesis

Roope Ahvenharju15 Apr 2023 19:16 UTC

1 point

1 comment1 min readLW link

A trick for Safer GPT-N

Razied23 Aug 2020 0:39 UTC

7 points

1 comment2 min readLW link

SHY001 A Named Behavior Loop Trained and Deployed in GPT Systems

0san Shin12 May 2025 7:36 UTC

1 point

0 comments1 min readLW link

Large Language Models can Strategically Deceive their Users when Put Under Pressure.

ReaderM15 Nov 2023 16:36 UTC

90 points

9 comments2 min readLW link 1 review

(arxiv.org)

New Tool: the Residual Stream Viewer

AdamYedidia1 Oct 2023 0:49 UTC

32 points

7 comments4 min readLW link

(tinyurl.com)

More experiments in GPT-4 agency: writing memos

Christopher King24 Mar 2023 17:51 UTC

5 points

2 comments10 min readLW link

ChatGPT tells stories about XP-708-DQ, Eliezer, dragons, dark sorceresses, and unaligned robots becoming aligned

Bill Benzon8 Jan 2023 23:21 UTC

6 points

2 comments18 min readLW link

The positional embedding matrix and previous-token heads: how do they actually work?

AdamYedidia10 Aug 2023 1:58 UTC

28 points

4 comments13 min readLW link

[Question] 1h-volunteers needed for a small AI Safety-related research project

PabloAMC16 Aug 2021 17:53 UTC

2 points

0 comments1 min readLW link

ChatGPT (and now GPT4) is very easily distracted from its rules

dmcs15 Mar 2023 17:55 UTC

180 points

42 comments1 min readLW link

The Gallery for Painting Transformations—A GPT-3 Analogy

Robert_AIZI19 Jan 2023 23:32 UTC

1 point

0 comments6 min readLW link

(aizi.substack.com)

PaLM in “Extrapolating GPT-N performance”

Lukas Finnveden6 Apr 2022 13:05 UTC

85 points

19 comments2 min readLW link

[Question] What exactly is GPT-3′s base objective?

Daniel Kokotajlo10 Nov 2021 0:57 UTC

60 points

14 comments2 min readLW link

[Question] Why does ChatGPT throw an error when outputting “David Mayer”?

Archimedes1 Dec 2024 0:11 UTC

6 points

9 comments1 min readLW link

The Information: OpenAI shows ‘Strawberry’ to feds, races to launch it

Martín Soto27 Aug 2024 23:10 UTC

144 points

15 comments3 min readLW link

Some miscellaneous thoughts on ChatGPT, stories, and mechanical interpretability

Bill Benzon4 Feb 2023 19:35 UTC

2 points

0 comments3 min readLW link

The Limitations of GPT-4

p.b.24 Nov 2023 15:30 UTC

27 points

12 comments4 min readLW link

Harry Potter and the Data Centers of Doom

RomanS31 Mar 2023 10:42 UTC

14 points

5 comments4 min readLW link

Retrospective on ‘GPT-4 Predictions’ After the Release of GPT-4

Stephen McAleese17 Mar 2023 18:34 UTC

26 points

6 comments6 min readLW link

Of pumpkins, the Falcon Heavy, and Groucho Marx: High-Level discourse structure in ChatGPT

Bill Benzon8 Dec 2022 22:25 UTC

2 points

0 comments8 min readLW link

So, just why do GPTs have to operate by continuing an existing string?

Bill Benzon24 Mar 2023 12:08 UTC

−4 points

0 comments3 min readLW link

AI and the Map of Your Mind: Pattern Recognition

Scott Broock20 Mar 2023 17:43 UTC

2 points

2 comments6 min readLW link

Fred the Heretic, a GPT for poetry

Bill Benzon8 Dec 2024 16:52 UTC

3 points

0 comments1 min readLW link

This anime storyboard doesn’t exist: a graphic novel written and illustrated by GPT4

RomanS5 Oct 2023 14:01 UTC

12 points

7 comments55 min readLW link

All GPT skills are translation

p.b.13 Dec 2020 20:06 UTC

4 points

0 comments2 min readLW link

ChatGPT vs the 2-4-6 Task

cwillu25 Jan 2023 6:59 UTC

20 points

4 comments3 min readLW link

AMA on Truthful AI: Owen Cotton-Barratt, Owain Evans & co-authors

Owain_Evans22 Oct 2021 16:23 UTC

31 points

15 comments1 min readLW link

LLMs and computation complexity

Jonathan Marcus28 Apr 2023 17:48 UTC

57 points

29 comments5 min readLW link

Let’s go meta: Grammatical knowledge and self-referential sentences [ChatGPT]

Bill Benzon12 Dec 2022 21:50 UTC

5 points

0 comments9 min readLW link

Using GPT-3 for preventing conflict during messaging — a pitch for an app

Eli_17 Mar 2022 11:02 UTC

22 points

17 comments3 min readLW link

The Peril of the Great Leaks (written with ChatGPT)

bvbvbvbvbvbvbvbvbvbvbv31 Mar 2023 18:14 UTC

3 points

1 comment1 min readLW link

[Question] Is it a coincidence that GPT-3 requires roughly the same amount of compute as is necessary to emulate the human brain?

RomanS10 Feb 2023 16:26 UTC

11 points

10 comments1 min readLW link

ChatGPT: Tantalizing afterthoughts in search of story trajectories [induction heads]

Bill Benzon3 Feb 2023 10:35 UTC

4 points

0 comments20 min readLW link

GPT-2 Sometimes Fails at IOI

Ronak_Mehta14 Aug 2024 23:24 UTC

13 points

0 comments2 min readLW link

(ronakrm.github.io)

The Self-Hating Attention Head: A Deep Dive in GPT-2

Matteo Migliarini4 Jul 2025 13:07 UTC

12 points

0 comments7 min readLW link

The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited

mwatkins31 Jul 2023 19:47 UTC

85 points

29 comments20 min readLW link

Bing finding ways to bypass Microsoft’s filters without being asked. Is it reproducible?

Christopher King20 Feb 2023 15:11 UTC

27 points

15 comments1 min readLW link

I had a chat with GPT-4 on the future of AI and AI safety

Kristian Freed28 Mar 2023 17:47 UTC

1 point

0 comments8 min readLW link

[Question] GPT-3 + GAN

stick10917 Oct 2020 7:58 UTC

4 points

3 comments1 min readLW link

[Question] Who owns OpenAI’s new language model?

ioannes14 Feb 2019 17:51 UTC

16 points

9 comments1 min readLW link

Maybe talking isn’t the best way to communicate with LLMs

mnvr17 Jan 2024 6:24 UTC

3 points

1 comment1 min readLW link

(mrmr.io)

Large Language Models Pass the Turing Test

Matrice Jacobine2 Apr 2025 5:41 UTC

6 points

0 comments1 min readLW link

(arxiv.org)

[Question] Is the speed of training large models going to increase significantly in the near future due to Cerebras Andromeda?

Amal 15 Nov 2022 22:50 UTC

13 points

11 comments1 min readLW link

GPT, the magical collaboration zone, Lex Fridman and Sam Altman

Bill Benzon18 Mar 2024 20:04 UTC

3 points

1 comment3 min readLW link

Simulate the CEO

robotelvis12 Aug 2023 0:09 UTC

23 points

5 comments5 min readLW link

(messyprogress.substack.com)

ChatGPT explores the semantic differential

Bill Benzon9 Mar 2023 13:09 UTC

7 points

2 comments7 min readLW link

[Question] What experiment settles the Gary Marcus vs Geoffrey Hinton debate?

Valentin Baltadzhiev14 Feb 2024 9:06 UTC

12 points

8 comments1 min readLW link

[Question] How is GPT-4o Related to GPT-4?

Joel Burget15 May 2024 18:33 UTC

10 points

2 comments1 min readLW link

Uncompetitive programming with GPT-3

Bezzi6 Feb 2022 10:19 UTC

7 points

8 comments3 min readLW link

Collective Identity

Niki Dupuis, ukc10014 and Garrett Baker

18 May 2023 9:00 UTC

59 points

13 comments8 min readLW link

Truthful AI: Developing and governing AI that does not lie

Owain_Evans, owencb and Lukas Finnveden

18 Oct 2021 18:37 UTC

82 points

9 comments10 min readLW link

[Question] If we have Human-level chatbots, won’t we end up being ruled by possible people?

Erlja Jkdf.20 Sep 2022 13:59 UTC

5 points

13 comments1 min readLW link

[Question] Barcoding LLM Training Data Subsets. Anyone trying this for interpretability?

right..enough?13 Apr 2024 3:09 UTC

7 points

0 comments7 min readLW link

[Question] What are the most important papers/post/resources to read to understand more of GPT-3?

adamShimi2 Aug 2020 20:53 UTC

22 points

4 comments1 min readLW link

Two new datasets for evaluating political sycophancy in LLMs

alma.liezenga28 Sep 2024 18:29 UTC

9 points

0 comments9 min readLW link

Transformer Architecture Choice for Resisting Prompt Injection and Jail-Breaking Attacks

RogerDearnaley21 May 2023 8:29 UTC

11 points

1 comment4 min readLW link

Philosophical Cyborg (Part 1)

ukc10014, Roman Leventov and Niki Dupuis

14 Jun 2023 16:20 UTC

31 points

4 comments13 min readLW link

Recall and Regurgitation in GPT2

Megan Kinniment3 Oct 2022 19:35 UTC

43 points

1 comment26 min readLW link

Polluting the agentic commons

hamandcheese13 Apr 2023 17:42 UTC

7 points

4 comments2 min readLW link

(www.secondbest.ca)

If it quacks like a duck...

RationalMindset26 Mar 2023 18:54 UTC

−4 points

0 comments4 min readLW link

What’s the Most Impressive Thing That GPT-4 Could Plausibly Do?

bayesed26 Aug 2022 15:34 UTC

24 points

22 comments1 min readLW link

No comments.