Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Coverage-driven alignment—What ‘Teaching Claude Why’ can borrow from AV verification
Yoav Hollander
8 Jun 2026 11:42 UTC
2
points
0
comments
14
min read
LW
link
(blog.foretellix.com)
Bun’s Migration from Zig to Rust as a Potential Case Study for Gradual Disempowerment
Sayhan Yalvaçer
8 Jun 2026 7:06 UTC
−1
points
0
comments
3
min read
LW
link
Mental causation is not load-bearing
jessicata
7 Jun 2026 20:43 UTC
29
points
2
comments
10
min read
LW
link
How Far Apart Does a Model Think Its Tokens Are?
Brendan Long
7 Jun 2026 20:20 UTC
36
points
0
comments
9
min read
LW
link
Autopilot Thinking
XelaP
7 Jun 2026 20:20 UTC
10
points
4
comments
6
min read
LW
link
Secret Loyalties Likely Raise Remote-Influenceability
Kaustubh Kislay
7 Jun 2026 17:51 UTC
11
points
0
comments
6
min read
LW
link
From One Piece to One Pace - Vision and mission in temporary coordination of agents
a unemployed pastor- de S Brito
7 Jun 2026 17:07 UTC
4
points
0
comments
3
min read
LW
link
Neglected Basics of AI Alignment
Quirinus_Quirrell
7 Jun 2026 9:02 UTC
26
points
0
comments
6
min read
LW
link
Can activation verbalizers surface an internal chain of thought?
oakhu
and
ryan_greenblatt
7 Jun 2026 4:24 UTC
92
points
0
comments
16
min read
LW
link
Against Corrigibility
peralice
6 Jun 2026 20:28 UTC
58
points
16
comments
12
min read
LW
link
Freud heard a rumor that Science existed, and had a wonderful dream
Bruce Middleton
6 Jun 2026 14:47 UTC
8
points
8
comments
6
min read
LW
link
Coalitional Darwinism and the Instrumental Utility of Individuality
CarolusRenniusVitellius
6 Jun 2026 12:53 UTC
24
points
5
comments
17
min read
LW
link
(charlesr-w.github.io)
Why Software Automation Is Hard
silentbob
6 Jun 2026 8:56 UTC
104
points
19
comments
12
min read
LW
link
What if Anthropic unilaterally paused capabilities development right now?
Karl von Wendt
6 Jun 2026 7:39 UTC
51
points
12
comments
3
min read
LW
link
Optimisation over non-stationary distributions creates weirder minds
Samuel Ratnam
and
Pjain
6 Jun 2026 0:05 UTC
33
points
1
comment
4
min read
LW
link
[Question]
Does robotics capabilities research accelerate AGI timelines?
Master Chief
5 Jun 2026 23:32 UTC
4
points
3
comments
1
min read
LW
link
Two More Methods for Consistency Training and Some New Ways to Apply It
David Africa
,
Sukrati_Gautam
,
Neil Shah
and
arav-dhoot
5 Jun 2026 21:06 UTC
18
points
0
comments
7
min read
LW
link
Revisiting GSM-Symbolic: models seem to reason okay, actually
Sturb
5 Jun 2026 20:54 UTC
12
points
0
comments
5
min read
LW
link
Accepting Death & Adult Responsibility
Unreal
5 Jun 2026 19:23 UTC
−19
points
10
comments
4
min read
LW
link
The Masochistic Prior
Dev.Roland
5 Jun 2026 19:05 UTC
12
points
1
comment
2
min read
LW
link
(substack.com)
Back to top
Next