Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Pursuing the target
Adam Zerner
3 May 2026 7:59 UTC
8
points
0
comments
2
min read
LW
link
Notes on equanimity from the inside
nonplus
2 May 2026 23:42 UTC
10
points
0
comments
4
min read
LW
link
Psychopathy: The Substrate
Dawn Drescher
2 May 2026 22:48 UTC
11
points
0
comments
8
min read
LW
link
(impartial-priorities.org)
Measuring the ability of Opus 4.5 to fool narrow classifiers
Fabien Roger
and
John Hughes
2 May 2026 22:43 UTC
30
points
0
comments
8
min read
LW
link
You Are Not Immune To Mode Collapse
J Bostock
2 May 2026 19:57 UTC
73
points
14
comments
4
min read
LW
link
(jbostock.substack.com)
AI Risk Agility Plans—v0.1
Chris_Leong
2 May 2026 19:30 UTC
8
points
0
comments
1
min read
LW
link
A new rationalist self-improvement book: the 12 Levers
spencerg
2 May 2026 17:40 UTC
37
points
0
comments
6
min read
LW
link
OpenAI’s red line for AI self-improvement is fundamentally flawed
Charbel-Raphaël
2 May 2026 14:44 UTC
31
points
1
comment
3
min read
LW
link
Psychopathy: The Problem
Dawn Drescher
2 May 2026 10:23 UTC
8
points
5
comments
11
min read
LW
link
(impartial-priorities.org)
Games that change your mind
KatjaGrace
2 May 2026 7:40 UTC
54
points
17
comments
3
min read
LW
link
(worldspiritsockpuppet.com)
Primary Care Physicians are Incompetent. We Need More of Them.
Hide
2 May 2026 5:47 UTC
38
points
23
comments
9
min read
LW
link
(hidefromit.substack.com)
Contributing to Technical Research in the AI Safety End Game
Sturb
2 May 2026 3:17 UTC
24
points
0
comments
4
min read
LW
link
A Simulation of Social Groups Under A Gift Economy
Mira Kennard
2 May 2026 2:26 UTC
20
points
1
comment
5
min read
LW
link
Human-looking robots are a bad idea
martinkunev
2 May 2026 1:04 UTC
1
point
0
comments
4
min read
LW
link
How Go Players Disempower Themselves to AI
Ashe Vazquez Nuñez
1 May 2026 23:24 UTC
315
points
21
comments
8
min read
LW
link
Early-stage empirical work on “spillway motivations”
Arjun Khandelwal
,
Anders Cairns Woodruff
and
Alex Mallen
1 May 2026 21:29 UTC
21
points
0
comments
8
min read
LW
link
Conditional misalignment: Mitigations can hide EM behind contextual cues
Jan Dubiński
and
Owain_Evans
1 May 2026 20:09 UTC
60
points
1
comment
11
min read
LW
link
Ambitious Mech Interp w/ Tensor-transformers on toy languages [Project Proposal]
Logan Riggs
1 May 2026 19:17 UTC
19
points
0
comments
2
min read
LW
link
Risk from fitness-seeking AIs: mechanisms and mitigations
Alex Mallen
1 May 2026 17:42 UTC
93
points
0
comments
32
min read
LW
link
Your four-dimensional body
PatrickDFarley
1 May 2026 17:22 UTC
6
points
0
comments
3
min read
LW
link
Back to top
Next