Coup is the Pareto-op­ti­mal so­cial game

Daniel Tan21 Jun 2026 23:31 UTC
22 points
7 comments2 min readLW link

In­tro­duc­ing MonitoringBench

monika_j21 Jun 2026 18:43 UTC
41 points
0 comments6 min readLW link

How per­sona train­ing could fail

Simon Lermen21 Jun 2026 16:38 UTC
13 points
0 comments4 min readLW link

A high-level model of AI bargaining

Anthony DiGiovanni21 Jun 2026 15:37 UTC
17 points
1 comment5 min readLW link

Policy changes should be rol­led out gradually

Yair Halberstadt21 Jun 2026 11:07 UTC
28 points
2 comments3 min readLW link

A mis­al­ign­ment taxonomy

Alec Harris21 Jun 2026 10:20 UTC
13 points
2 comments3 min readLW link

The Cookie Mon­ster Ex­plains AI Safety

michaelwaves21 Jun 2026 0:52 UTC
12 points
2 comments2 min readLW link

Google Can’t Math Parsecs

jefftk21 Jun 2026 0:30 UTC
96 points
0 comments1 min readLW link
(www.jefftk.com)

How are there 0 stud­ies (maybe 1) on sex-con­cor­dant hor­mone ther­apy?

Util20 Jun 2026 22:36 UTC
14 points
0 comments3 min readLW link

Against Planet-Eat­ing Nanoreplicators

SurvivalBias20 Jun 2026 20:27 UTC
10 points
7 comments5 min readLW link

How trans­par­ent is Diffu­sionGemma (and why it mat­ters)

20 Jun 2026 20:05 UTC
72 points
2 comments4 min readLW link

An­i­mal Fu­tures Fore­cast­ing Tournament

david reinstein20 Jun 2026 19:39 UTC
14 points
2 comments1 min readLW link

The In­visi­ble Side of AI Governance

Charbel-Raphaël20 Jun 2026 18:54 UTC
100 points
4 comments14 min readLW link

Would any­body here be in­ter­ested in a “mis­take post­mortem” dis­cus­sion group?

SK220 Jun 2026 12:03 UTC
50 points
7 comments4 min readLW link

Unchicke­nous Apri­cot Berry Cake

jefftk20 Jun 2026 2:20 UTC
22 points
1 comment1 min readLW link
(www.jefftk.com)

The LLM shog­goth meme is weirder than you think

HedonicEscalator19 Jun 2026 23:35 UTC
126 points
8 comments7 min readLW link
(hedonicescalator.substack.com)

How I think de­vel­op­ers of fron­tier AI sys­tems and reg­u­la­tors ought to act in the face of ex­is­ten­tial AI risk

WilliamKiely19 Jun 2026 22:22 UTC
12 points
0 comments12 min readLW link

Hyper­sti­tion as the Nat­u­ral Enemy of Rationality

alseph19 Jun 2026 21:12 UTC
29 points
8 comments3 min readLW link

World-mod­el­ing the US vs. An­thropic Stand­off on Claude Fable

dschwarz19 Jun 2026 20:04 UTC
20 points
4 comments8 min readLW link

Thoughts on Like­li­hood of Ex­is­ten­tial Risks by Misal­igned AIs

Ishan Khire19 Jun 2026 19:17 UTC
3 points
0 comments6 min readLW link
(ishankhire.substack.com)

Why should AI be moral?

Zach Thornton19 Jun 2026 19:13 UTC
12 points
3 comments9 min readLW link

AI Safety Ecosys­tem Re­search notes

Eneasz19 Jun 2026 18:21 UTC
31 points
1 comment8 min readLW link

A brief list of ways AI safety efforts could be net negative

Elias Schmied19 Jun 2026 16:12 UTC
28 points
4 comments2 min readLW link

On­line >> real life for spread­ing ideas

Bill Jackson19 Jun 2026 15:44 UTC
12 points
1 comment2 min readLW link

Typ­i­cal Minds Aren’t

Gordon Seidoh Worley19 Jun 2026 15:11 UTC
5 points
6 comments2 min readLW link
(www.uncertainupdates.com)

San Silvestro

Tomás B.19 Jun 2026 14:54 UTC
39 points
1 comment14 min readLW link
(open.substack.com)

Claude Fable 5 and Mythos 5: Capabilities

Zvi19 Jun 2026 14:40 UTC
30 points
2 comments38 min readLW link
(thezvi.wordpress.com)

The one-week sprint

Daniel Tan19 Jun 2026 12:46 UTC
41 points
4 comments2 min readLW link

Futarchy is in­se­cure with­out a trusted gatekeeper

distbit19 Jun 2026 12:22 UTC
2 points
0 comments10 min readLW link

Patch­ing ~All Se­cu­rity-Rele­vant Open-Source Soft­ware? [ni­plav 2025]

Quinn19 Jun 2026 12:13 UTC
15 points
1 comment1 min readLW link
(forum.effectivealtruism.org)

Cos­molog­i­cal Odyssey

breaker2519 Jun 2026 5:06 UTC
−12 points
1 comment3 min readLW link

Re­search agenda: In­ter­pre­tive debate

Shi18 Jun 2026 23:46 UTC
34 points
0 comments7 min readLW link

Does it feel any differ­ent to be re­verse-chiral life?

jessicata18 Jun 2026 22:56 UTC
10 points
0 comments10 min readLW link

Re­in­force­ment learn­ing to­wards broadly and per­sis­tently benefi­cial models

papetoast18 Jun 2026 22:11 UTC
19 points
0 comments1 min readLW link
(alignment.openai.com)

The dis­til­la­tion dou­ble bind: Distill­ing mis­al­igned mod­els ei­ther trans­fers mis­al­ign­ment or it doesn’t

18 Jun 2026 21:21 UTC
57 points
4 comments5 min readLW link
(blog.redwoodresearch.org)

CoT-forc­ing promptware

Bruce Middleton18 Jun 2026 19:33 UTC
2 points
0 comments2 min readLW link

AI that rep­re­sents you can’t be neu­tral.

agulaya2418 Jun 2026 18:50 UTC
−1 points
2 comments3 min readLW link

On “Model Or­ganisms”

J Bostock18 Jun 2026 18:42 UTC
33 points
1 comment6 min readLW link

In­tro­duc­tion: Gaus­sian Nat­u­ral Latents

Haru18 Jun 2026 18:41 UTC
41 points
2 comments3 min readLW link

GDM AI Con­trol Roadmap

18 Jun 2026 16:50 UTC
82 points
2 comments1 min readLW link

Con­tra Pace on When to Apologize

Zack_M_Davis18 Jun 2026 16:49 UTC
57 points
27 comments6 min readLW link
(zackmdavis.net)

Your Model Or­ganisms Might Be Fried

18 Jun 2026 16:18 UTC
92 points
6 comments7 min readLW link

Shard nar­cis­sism as delu­sion of unembededness

Fernand018 Jun 2026 14:29 UTC
10 points
1 comment4 min readLW link

AI #173: AI Pauses

Zvi18 Jun 2026 13:40 UTC
35 points
2 comments47 min readLW link
(thezvi.wordpress.com)

War of Dots: CRUSHING my op­po­nents with FACTS and LOGIC

momom218 Jun 2026 12:07 UTC
17 points
2 comments7 min readLW link

How far do open weights trail the fron­tier?

RobinHa18 Jun 2026 11:01 UTC
22 points
4 comments1 min readLW link
(robinhaselhorst.com)

Kar­ls­ruhe—LW/​ACX Meetup—June 2026

volis18 Jun 2026 9:55 UTC
1 point
0 comments1 min readLW link

GLM 5.2 play­ing text adventures

kqr18 Jun 2026 7:23 UTC
14 points
1 comment1 min readLW link
(entropicthoughts.com)

Lev­er­aged on be­ing right

Ben Pace, the Vacationing Vagabond18 Jun 2026 6:51 UTC
82 points
7 comments3 min readLW link

Vuln­er­a­bil­ities and ex­ploits: where are we headed?

tchauvin18 Jun 2026 5:49 UTC
9 points
0 comments5 min readLW link
(tchauvin.com)