ASI mo­tives and the ontonor­ma­tive goods (re IABIED’s core ar­gu­ment)

Zsolt Tanko3 May 2026 23:38 UTC
4 points
4 comments4 min readLW link

How did ‘large’ lan­guage mod­els get that way? The role of Trans­form­ers and Pre­train­ing in GPT

Oliver Sourbut3 May 2026 21:35 UTC
16 points
0 comments7 min readLW link
(www.oliversourbut.net)

Dairy cows make their mis­ery ex­pen­sive (but their calves can’t)

Elizabeth3 May 2026 19:20 UTC
159 points
1 comment6 min readLW link
(acesounderglass.com)

[Question] Look­ing for pa­pers on gen­eral for­mal­iza­tions of “agency”

lovagrus3 May 2026 18:32 UTC
12 points
1 comment2 min readLW link

Why I made Eng­ineer­ing Enigmas

kqr3 May 2026 18:04 UTC
13 points
0 comments3 min readLW link

Deon­tolog­i­cal bars should refer­ence the ac­tor’s beliefs

TFD3 May 2026 15:09 UTC
8 points
6 comments3 min readLW link

We don’t learn num­bers from set cardinality

azergante3 May 2026 11:33 UTC
4 points
15 comments3 min readLW link

MHC In­terp #1: Pre­vi­ous-To­ken Heads Be­come At­ten­tion Sinks Un­der Man­i­fold-Con­strained Hyper-Connections

Realmbird3 May 2026 11:06 UTC
21 points
3 comments5 min readLW link

The Repug­nant Lifes­pan Conclusion

XelaP3 May 2026 9:22 UTC
12 points
20 comments3 min readLW link

Pur­su­ing the target

Adam Zerner3 May 2026 7:59 UTC
30 points
1 comment2 min readLW link

Para­phras­ing Is (At Best) a Par­tial Defence Against Steganog­ra­phy in LLMs

3 May 2026 7:53 UTC
14 points
0 comments8 min readLW link

LLMs Choose the Safer Gam­ble Yet Price the Riskier One Higher

Jonathan Dang3 May 2026 7:51 UTC
12 points
0 comments4 min readLW link

By­pass­ing Re­fusal Be­hav­ior in Qwen Models via Ac­ti­va­tion Steering

Talib Mirza3 May 2026 6:07 UTC
1 point
0 comments2 min readLW link

Notes on equa­nim­ity from the inside

nonplus2 May 2026 23:42 UTC
15 points
1 comment4 min readLW link

Psy­chopa­thy: The Substrate

Dawn Drescher2 May 2026 22:48 UTC
5 points
0 comments8 min readLW link
(impartial-priorities.org)

Mea­sur­ing the abil­ity of Opus 4.5 to fool nar­row classifiers

2 May 2026 22:43 UTC
31 points
0 comments8 min readLW link

Eval­u­at­ing differ­ent AI’s on Afri­can livestck knowledge

Fatika Umar Ibrahim2 May 2026 20:28 UTC
23 points
4 comments1 min readLW link

An­nounc­ing Me­tac­u­lus Sum­mer 2026 Fu­tureE­val Bot Tournament

postreal2 May 2026 20:27 UTC
1 point
0 comments4 min readLW link
(www.metaculus.com)

You Are Not Im­mune To Mode Collapse

J Bostock2 May 2026 19:57 UTC
127 points
18 comments4 min readLW link
(jbostock.substack.com)

AI Risk Agility Plans—v0.1

Chris_Leong2 May 2026 19:30 UTC
10 points
0 comments1 min readLW link

A new ra­tio­nal­ist self-im­prove­ment book: the 12 Levers

spencerg2 May 2026 17:40 UTC
56 points
1 comment6 min readLW link

OpenAI’s red line for AI self-im­prove­ment is fun­da­men­tally flawed

Charbel-Raphaël2 May 2026 14:44 UTC
35 points
3 comments3 min readLW link

Psy­chopa­thy: The Problem

Dawn Drescher2 May 2026 10:23 UTC
19 points
16 comments11 min readLW link
(impartial-priorities.org)

Games that change your mind

KatjaGrace2 May 2026 7:40 UTC
74 points
42 comments3 min readLW link
(worldspiritsockpuppet.com)

Un­der­stand why AI is a doom-risk in 39 cap­ti­vat­ing minutes

KatjaGrace2 May 2026 7:40 UTC
18 points
0 comments1 min readLW link
(worldspiritsockpuppet.com)

Pri­mary Care Physi­ci­ans are In­com­pe­tent. We Need More of Them.

Hide2 May 2026 5:47 UTC
53 points
35 comments9 min readLW link
(hidefromit.substack.com)

Con­tribut­ing to Tech­ni­cal Re­search in the AI Safety End Game

Sturb2 May 2026 3:17 UTC
34 points
1 comment4 min readLW link

A Si­mu­la­tion of So­cial Groups Un­der A Gift Economy

Mira Kennard2 May 2026 2:26 UTC
21 points
1 comment5 min readLW link

Hu­man-look­ing robots are a bad idea

martinkunev2 May 2026 1:04 UTC
1 point
0 comments4 min readLW link

How Go Play­ers Disem­power Them­selves to AI

Ashe Vazquez Nuñez1 May 2026 23:24 UTC
700 points
77 comments8 min readLW link

Early-stage em­piri­cal work on “spillway mo­ti­va­tions”

1 May 2026 21:29 UTC
26 points
3 comments8 min readLW link

Ex­plo­ra­tion Hack­ing: Can LLMs Learn to Re­sist RL Train­ing?

1 May 2026 20:54 UTC
24 points
0 comments8 min readLW link

Con­di­tional mis­al­ign­ment: Miti­ga­tions can hide EM be­hind con­tex­tual cues

1 May 2026 20:09 UTC
67 points
2 comments11 min readLW link

Am­bi­tious Mech In­terp w/​ Ten­sor-trans­form­ers on toy lan­guages [Pro­ject Pro­posal]

Logan Riggs1 May 2026 19:17 UTC
21 points
0 comments2 min readLW link

Risk from fit­ness-seek­ing AIs: mechanisms and mitigations

Alex Mallen1 May 2026 17:42 UTC
107 points
0 comments32 min readLW link

Your four-di­men­sional body

PatrickDFarley1 May 2026 17:22 UTC
8 points
1 comment3 min readLW link

Hous­ing Roundup #14: You Can’t Build That

Zvi1 May 2026 16:50 UTC
25 points
1 comment23 min readLW link
(thezvi.wordpress.com)

What do Rus­sian olympiad win­ners think of HPMOR? Our data

Mikhail Samin1 May 2026 13:28 UTC
21 points
2 comments1 min readLW link

Hous­ing Roundup #13: More Dakka

Zvi1 May 2026 13:00 UTC
13 points
1 comment13 min readLW link
(thezvi.wordpress.com)

Qualia are in­ter­nal vari­ables but they are taken from differ­ent realm

avturchin1 May 2026 10:43 UTC
9 points
13 comments2 min readLW link

Open strate­gic ques­tions for digi­tal minds

lucius1 May 2026 9:56 UTC
26 points
1 comment13 min readLW link
(outpaced.substack.com)

Juriscrip­tion: find­ing the medicines miss­ing somewhere

technicalities1 May 2026 9:55 UTC
29 points
1 comment2 min readLW link

Self driv­ing interview

KatjaGrace1 May 2026 8:30 UTC
16 points
0 comments4 min readLW link
(worldspiritsockpuppet.com)

11 ways to be less deferential

KatjaGrace1 May 2026 8:00 UTC
22 points
3 comments2 min readLW link
(worldspiritsockpuppet.com)

San­ity-check­ing “In­com­press­ible Knowl­edge Probes”

1 May 2026 6:52 UTC
60 points
12 comments16 min readLW link

Inkhaven, the 548th metapost

Sean Herrington1 May 2026 6:49 UTC
1 point
1 comment3 min readLW link

Au­tomat­ing In­ter­pretabil­ity with Agents

1 May 2026 2:59 UTC
10 points
0 comments10 min readLW link

Against In-Duct UV

jefftk1 May 2026 2:40 UTC
23 points
0 comments3 min readLW link
(www.jefftk.com)

Win­ners of the Man­i­fund Es­say Prize

Austin Chen1 May 2026 2:21 UTC
6 points
0 comments11 min readLW link
(manifund.substack.com)

Reflec­tions on InkHaven

David Scott Krueger1 May 2026 0:50 UTC
10 points
0 comments2 min readLW link
(therealartificialintelligence.substack.com)