The LLM shog­goth meme is weirder than you think

HedonicEscalator19 Jun 2026 23:35 UTC
119 points
8 comments7 min readLW link
(hedonicescalator.substack.com)

How I think de­vel­op­ers of fron­tier AI sys­tems and reg­u­la­tors ought to act in the face of ex­is­ten­tial AI risk

WilliamKiely19 Jun 2026 22:22 UTC
12 points
0 comments12 min readLW link

Hyper­sti­tion as the Nat­u­ral Enemy of Rationality

alseph19 Jun 2026 21:12 UTC
32 points
7 comments3 min readLW link

World-mod­el­ing the US vs. An­thropic Stand­off on Claude Fable

dschwarz19 Jun 2026 20:04 UTC
18 points
3 comments8 min readLW link

Thoughts on Like­li­hood of Ex­is­ten­tial Risks by Misal­igned AIs

Ishan Khire19 Jun 2026 19:17 UTC
3 points
0 comments6 min readLW link
(ishankhire.substack.com)

Why should AI be moral?

Zach Thornton19 Jun 2026 19:13 UTC
12 points
2 comments9 min readLW link

AI Safety Ecosys­tem Re­search notes

Eneasz19 Jun 2026 18:21 UTC
31 points
1 comment8 min readLW link

A brief list of ways AI safety efforts could be net negative

Elias Schmied19 Jun 2026 16:12 UTC
28 points
4 comments2 min readLW link

On­line >> real life for spread­ing ideas

Bill Jackson19 Jun 2026 15:44 UTC
12 points
1 comment2 min readLW link

Typ­i­cal Minds Aren’t

Gordon Seidoh Worley19 Jun 2026 15:11 UTC
5 points
6 comments2 min readLW link
(www.uncertainupdates.com)

San Silvestro

Tomás B.19 Jun 2026 14:54 UTC
39 points
1 comment14 min readLW link
(open.substack.com)

Claude Fable 5 and Mythos 5: Capabilities

Zvi19 Jun 2026 14:40 UTC
30 points
2 comments38 min readLW link
(thezvi.wordpress.com)

The one-week sprint

Daniel Tan19 Jun 2026 12:46 UTC
39 points
1 comment2 min readLW link

Futarchy is in­se­cure with­out a trusted gatekeeper

distbit19 Jun 2026 12:22 UTC
2 points
0 comments10 min readLW link

Patch­ing ~All Se­cu­rity-Rele­vant Open-Source Soft­ware? [ni­plav 2025]

Quinn19 Jun 2026 12:13 UTC
15 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

Cos­molog­i­cal Odyssey

breaker2519 Jun 2026 5:06 UTC
−12 points
1 comment3 min readLW link

Re­search agenda: In­ter­pre­tive debate

Shi18 Jun 2026 23:46 UTC
30 points
0 comments7 min readLW link

Does it feel any differ­ent to be re­verse-chiral life?

jessicata18 Jun 2026 22:56 UTC
10 points
0 comments10 min readLW link

Re­in­force­ment learn­ing to­wards broadly and per­sis­tently benefi­cial models

papetoast18 Jun 2026 22:11 UTC
19 points
0 comments1 min readLW link
(alignment.openai.com)

The dis­til­la­tion dou­ble bind: Distill­ing mis­al­igned mod­els ei­ther trans­fers mis­al­ign­ment or it doesn’t

18 Jun 2026 21:21 UTC
57 points
4 comments5 min readLW link
(blog.redwoodresearch.org)

CoT-forc­ing promptware

Bruce Middleton18 Jun 2026 19:33 UTC
2 points
0 comments2 min readLW link

AI that rep­re­sents you can’t be neu­tral.

agulaya2418 Jun 2026 18:50 UTC
−1 points
2 comments3 min readLW link

On “Model Or­ganisms”

J Bostock18 Jun 2026 18:42 UTC
31 points
1 comment6 min readLW link

In­tro­duc­tion: Gaus­sian Nat­u­ral Latents

Haru18 Jun 2026 18:41 UTC
41 points
2 comments3 min readLW link

GDM AI Con­trol Roadmap

18 Jun 2026 16:50 UTC
81 points
2 comments1 min readLW link

Con­tra Pace on When to Apologize

Zack_M_Davis18 Jun 2026 16:49 UTC
54 points
21 comments6 min readLW link
(zackmdavis.net)

Your Model Or­ganisms Might Be Fried

18 Jun 2026 16:18 UTC
84 points
6 comments7 min readLW link

Shard nar­cis­sism as delu­sion of unembededness

Fernand018 Jun 2026 14:29 UTC
10 points
1 comment4 min readLW link

AI #173: AI Pauses

Zvi18 Jun 2026 13:40 UTC
35 points
2 comments47 min readLW link
(thezvi.wordpress.com)

War of Dots: CRUSHING my op­po­nents with FACTS and LOGIC

momom218 Jun 2026 12:07 UTC
17 points
2 comments7 min readLW link

How far do open weights trail the fron­tier?

RobinHa18 Jun 2026 11:01 UTC
22 points
4 comments1 min readLW link
(robinhaselhorst.com)

Kar­ls­ruhe—LW/​ACX Meetup—June 2026

volis18 Jun 2026 9:55 UTC
1 point
0 comments1 min readLW link

GLM 5.2 play­ing text adventures

kqr18 Jun 2026 7:23 UTC
14 points
1 comment1 min readLW link
(entropicthoughts.com)

Lev­er­aged on be­ing right

Ben Pace, the Vacationing Vagabond18 Jun 2026 6:51 UTC
74 points
7 comments3 min readLW link

Vuln­er­a­bil­ities and ex­ploits: where are we headed?

tchauvin18 Jun 2026 5:49 UTC
9 points
0 comments5 min readLW link
(tchauvin.com)

Agents are un­der-elic­ited: A case study in op­ti­miza­tion tasks

18 Jun 2026 2:39 UTC
17 points
1 comment7 min readLW link
(fulcrum.inc)

A pre­limi­nary ex­per­i­ment re­gard­ing con­sis­tency as a mea­sure of con­cep­tual abil­ities in lan­guage models

Chi Nguyen17 Jun 2026 22:56 UTC
20 points
3 comments7 min readLW link
(casparoesterheld.com)

Kraków Aligned

17 Jun 2026 20:21 UTC
1 point
0 comments1 min readLW link

Gears for poli­ti­cal races

Tom Smith17 Jun 2026 20:19 UTC
163 points
19 comments14 min readLW link

“Did you lie?” Eval­u­at­ing Lie De­tec­tors across Model Scale and Belief-Ver­ified Model Organisms

17 Jun 2026 18:43 UTC
30 points
0 comments6 min readLW link
(arxiv.org)

Port­ing MACHIAVELLI To Inspect

Koby Lewis17 Jun 2026 17:58 UTC
7 points
0 comments4 min readLW link
(kobylewis.net)

Sev­eral fron­tier mod­els are sub­stan­tially pre­fill aware

17 Jun 2026 17:41 UTC
59 points
2 comments5 min readLW link

Lock-In Risk Needs More Re­searchers. Here’s Where to Start

Alfie Lamerton17 Jun 2026 17:33 UTC
12 points
2 comments13 min readLW link

A Geo­met­ric Ac­count of Ac­ti­va­tion Steer­ing through An­gle–Norm Decomposition

17 Jun 2026 15:23 UTC
9 points
0 comments5 min readLW link
(atmyre.github.io)

The Once And Fu­ture Fable #3: Fix This Code

Zvi17 Jun 2026 14:10 UTC
62 points
9 comments21 min readLW link
(thezvi.wordpress.com)

Align­ment pre­train­ing could backfire

Alexandre Variengien17 Jun 2026 13:52 UTC
43 points
8 comments1 min readLW link

Toward a Kan­tian re­fu­ta­tion of Agent Foundations

Fernand017 Jun 2026 13:30 UTC
9 points
0 comments8 min readLW link

Illu­sion­ists should try to build hedonium

Jack Thompson17 Jun 2026 12:25 UTC
−3 points
6 comments9 min readLW link
(jacktlab.substack.com)

Omis­sion At­tacks Pro­ject Proposal

Chris Harig17 Jun 2026 7:08 UTC
2 points
0 comments3 min readLW link

The Fi­nan­cial Ledger The­ory of Apologies

Ben Pace, the Vacationing Vagabond17 Jun 2026 6:57 UTC
46 points
9 comments4 min readLW link