RSS

Pur­su­ing the target

Adam Zerner3 May 2026 7:59 UTC
8 points
0 comments2 min readLW link

Notes on equa­nim­ity from the inside

nonplus2 May 2026 23:42 UTC
10 points
0 comments4 min readLW link

Psy­chopa­thy: The Substrate

Dawn Drescher2 May 2026 22:48 UTC
11 points
0 comments8 min readLW link
(impartial-priorities.org)

Mea­sur­ing the abil­ity of Opus 4.5 to fool nar­row classifiers

2 May 2026 22:43 UTC
30 points
0 comments8 min readLW link

You Are Not Im­mune To Mode Collapse

J Bostock2 May 2026 19:57 UTC
73 points
14 comments4 min readLW link
(jbostock.substack.com)

AI Risk Agility Plans—v0.1

Chris_Leong2 May 2026 19:30 UTC
8 points
0 comments1 min readLW link

A new ra­tio­nal­ist self-im­prove­ment book: the 12 Levers

spencerg2 May 2026 17:40 UTC
37 points
0 comments6 min readLW link

OpenAI’s red line for AI self-im­prove­ment is fun­da­men­tally flawed

Charbel-Raphaël2 May 2026 14:44 UTC
31 points
1 comment3 min readLW link

Psy­chopa­thy: The Problem

Dawn Drescher2 May 2026 10:23 UTC
8 points
5 comments11 min readLW link
(impartial-priorities.org)

Games that change your mind

KatjaGrace2 May 2026 7:40 UTC
54 points
17 comments3 min readLW link
(worldspiritsockpuppet.com)

Pri­mary Care Physi­ci­ans are In­com­pe­tent. We Need More of Them.

Hide2 May 2026 5:47 UTC
38 points
23 comments9 min readLW link
(hidefromit.substack.com)

Con­tribut­ing to Tech­ni­cal Re­search in the AI Safety End Game

Sturb2 May 2026 3:17 UTC
24 points
0 comments4 min readLW link

A Si­mu­la­tion of So­cial Groups Un­der A Gift Economy

Mira Kennard2 May 2026 2:26 UTC
20 points
1 comment5 min readLW link

Hu­man-look­ing robots are a bad idea

martinkunev2 May 2026 1:04 UTC
1 point
0 comments4 min readLW link

How Go Play­ers Disem­power Them­selves to AI

Ashe Vazquez Nuñez1 May 2026 23:24 UTC
315 points
21 comments8 min readLW link

Early-stage em­piri­cal work on “spillway mo­ti­va­tions”

1 May 2026 21:29 UTC
21 points
0 comments8 min readLW link

Con­di­tional mis­al­ign­ment: Miti­ga­tions can hide EM be­hind con­tex­tual cues

1 May 2026 20:09 UTC
60 points
1 comment11 min readLW link

Am­bi­tious Mech In­terp w/​ Ten­sor-trans­form­ers on toy lan­guages [Pro­ject Pro­posal]

Logan Riggs1 May 2026 19:17 UTC
19 points
0 comments2 min readLW link

Risk from fit­ness-seek­ing AIs: mechanisms and mitigations

Alex Mallen1 May 2026 17:42 UTC
93 points
0 comments32 min readLW link

Your four-di­men­sional body

PatrickDFarley1 May 2026 17:22 UTC
6 points
0 comments3 min readLW link