Good­hart’s Law and a Min­i­mum Vi­able Su­garscape: Karpa­thy Pat­tern ABM Autoresearch

Raven Of Empire23 May 2026 23:24 UTC
2 points
0 comments6 min readLW link

Ve­ganism is Vir­tu­ous, not Obligatory

Hide23 May 2026 23:19 UTC
10 points
10 comments25 min readLW link
(hidefromit.substack.com)

Low Ex­pec­tancy is Not a Con­fi­dence Problem

Alex A23 May 2026 22:48 UTC
13 points
1 comment2 min readLW link

Ba­sic prin­ci­ples for dress­ing bet­ter.

spookycat23 May 2026 19:59 UTC
69 points
24 comments5 min readLW link

Boltz­mann brains, like Dooms­day, re­quire no explaining

Steff23 May 2026 16:16 UTC
−18 points
3 comments12 min readLW link

Prob­a­bil­ities are not the right concept

David Matolcsi23 May 2026 16:10 UTC
82 points
30 comments15 min readLW link

Your Left Brain Doesn’t Trade With Your Right

Alexander Gietelink Oldenziel23 May 2026 15:12 UTC
53 points
22 comments5 min readLW link

Out-of-Con­text Rea­son­ing (OOCR) in LLMs: A Short Primer and Read­ing List

Owain_Evans23 May 2026 2:46 UTC
41 points
2 comments5 min readLW link
(outofcontextreasoning.com)

Cap­i­tal­ism is only the first of our problems

Ian Matson23 May 2026 2:22 UTC
−9 points
1 comment5 min readLW link

A poli­ti­cal move­ment will save us from extinction

rohantohab23 May 2026 2:12 UTC
1 point
1 comment1 min readLW link
(open.substack.com)

How should we up­date on AI-en­abled coups post-Mythos?

callumzc23 May 2026 2:10 UTC
16 points
3 comments5 min readLW link

How Hu­mans Will Achieve Im­mor­tal­ity (by tran­scend­ing biol­ogy and be­com­ing ma­chine in­tel­li­gence)

SeymourJReid23 May 2026 2:08 UTC
−4 points
0 comments8 min readLW link

PLA Daily Trans­la­tion: Reflec­tions on War­fare Brought by AGI

eeeee23 May 2026 0:52 UTC
51 points
1 comment11 min readLW link

Can Large Lan­guage Models Iden­tify Novel Threats? Part 1: Mir­ror Life and the Clas­sifi­ca­tion Gap

Failfinder7023 May 2026 0:15 UTC
8 points
0 comments3 min readLW link

The Leaky AI Safety Pipeline

Nikhil Kalidasu23 May 2026 0:14 UTC
12 points
0 comments5 min readLW link
(crosscurrents.ink)

The Fun­da­men­tals of Cog­itism: Ground­ing Ethics in the Na­ture of Consciousness

ArtiFabian23 May 2026 0:13 UTC
3 points
4 comments4 min readLW link
(sikerspot.com)

Look­ing for back­doors in Jane Street LLMs

Cipolla23 May 2026 0:06 UTC
16 points
0 comments14 min readLW link

Will we re­ally put data cen­ters in space?

22 May 2026 23:51 UTC
91 points
23 comments5 min readLW link
(www.forethought.org)

We made a map of the doom debate

22 May 2026 23:24 UTC
40 points
9 comments6 min readLW link

Which tech­ni­cal AI safety fields are go­ing to be au­to­mated first?

Chamod Kalupahana22 May 2026 17:32 UTC
21 points
5 comments6 min readLW link

Gem­ini 3.5 Flash Looks Good For How Fast It Is

Zvi22 May 2026 17:30 UTC
34 points
4 comments7 min readLW link
(thezvi.wordpress.com)

The AI In­dus­trial Ex­plo­sion — Part 3: Go­ing faster

djbinder22 May 2026 16:38 UTC
19 points
2 comments14 min readLW link
(defensesindepth.bio)

Strong Longter­mism Is Sim­ply Cor­rect

Bentham's Bulldog22 May 2026 15:57 UTC
1 point
1 comment19 min readLW link

Notes on Col­lab­o­rat­ing with Claude Opus

Nissa Seru22 May 2026 15:35 UTC
40 points
2 comments1 min readLW link

Pro­posal for “Timelines to what”: DIAL distribution

tlevin22 May 2026 14:40 UTC
21 points
0 comments1 min readLW link

AI is Not Nor­mal Technology

Olivia Scharfman22 May 2026 10:27 UTC
16 points
2 comments19 min readLW link

Count­ing Ar­gu­ments in AI Safety

Samuel Ratnam22 May 2026 8:43 UTC
16 points
13 comments3 min readLW link
(substack.com)

In­surance Premiums To The Moon

PossiblyElaine22 May 2026 6:09 UTC
18 points
1 comment4 min readLW link
(possiblyelaine.substack.com)

Moder­a­tor’s Prin­ci­ple of Least Surprise

Czynski21 May 2026 21:02 UTC
21 points
6 comments6 min readLW link
(dangeroussincerity.substack.com)

You can opt out of allergies

Rattengift21 May 2026 19:54 UTC
31 points
12 comments1 min readLW link

Pos­si­ble red is red

avturchin21 May 2026 17:30 UTC
1 point
9 comments4 min readLW link

Apr-May 2026 AI Se­cu­rity via For­mal Methods

Quinn21 May 2026 15:40 UTC
12 points
0 comments1 min readLW link
(newsletter.for-all.dev)

An In­tro­duc­tion to Neo-Fatal­ism

julius vidal21 May 2026 15:18 UTC
4 points
0 comments11 min readLW link
(cyberzenics.substack.com)

Loss of Over­sight: How AI Sys­tems May Be­come Harder to Au­dit, Mon­i­tor, and Investigate

21 May 2026 14:52 UTC
83 points
0 comments6 min readLW link
(www.aisi.gov.uk)

AI #169: New Knowledge

Zvi21 May 2026 13:20 UTC
39 points
10 comments47 min readLW link
(thezvi.wordpress.com)

What am I, if not an AI?

makiba21 May 2026 13:14 UTC
84 points
14 comments7 min readLW link

Learned Chain-of-Thought Obfus­ca­tion Gen­er­al­ises to Unseen Tasks

21 May 2026 10:11 UTC
31 points
0 comments5 min readLW link
(arxiv.org)

Why is hav­ing a child in­her­ently self­ish?

JacksonTan21 May 2026 3:51 UTC
0 points
1 comment1 min readLW link

Numb men­tal state shifts

KatjaGrace21 May 2026 3:50 UTC
36 points
2 comments1 min readLW link
(worldspiritsockpuppet.com)

Women should be able to open things

KatjaGrace21 May 2026 3:50 UTC
340 points
134 comments2 min readLW link
(worldspiritsockpuppet.com)

Why are peo­ple so scared of caus­ing fear?

KatjaGrace21 May 2026 3:50 UTC
39 points
4 comments2 min readLW link
(worldspiritsockpuppet.com)

Doc­u­ment-tun­ing in­stills durable an­i­mal com­pas­sion in LLMs (and gen­er­al­izes to hu­mans)

21 May 2026 3:29 UTC
11 points
0 comments6 min readLW link

What About Us?

James Stephen Brown21 May 2026 2:48 UTC
4 points
0 comments5 min readLW link
(nonzerosum.games)

The Whole Kit­ten-Cavoodle

James Stephen Brown21 May 2026 2:32 UTC
5 points
0 comments5 min readLW link

Why does off-model SFT de­grade ca­pa­bil­ities?

21 May 2026 0:35 UTC
42 points
9 comments6 min readLW link

If I Were Em­peror of New AI Safety Re­searcher Train­ing...

Lorxus20 May 2026 23:10 UTC
21 points
3 comments8 min readLW link
(tiled-with-pentagons.blogspot.com)

the­ory up­lift differ­en­tially benefits safety & is underleveraged

yudhister20 May 2026 21:43 UTC
133 points
14 comments1 min readLW link

Sin­gu­lar Learn­ing The­ory Com­pre­hen­sive − 1

Agastya Agrawal20 May 2026 20:00 UTC
35 points
1 comment12 min readLW link

Sparse Effi­ciency vs. Su­per­po­si­tion: The In­ter­pretabil­ity Tradeoff

hillz20 May 2026 19:14 UTC
8 points
0 comments1 min readLW link

The Case for Eval­u­at­ing Model Behaviors

jsteinhardt20 May 2026 18:42 UTC
40 points
3 comments3 min readLW link