Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Cleo Nardo
Karma:
4,696
DMs open.
All
Posts
Comments
New
Top
Old
Page
1
Which goals actually motivate deceptive alignment?
Cleo Nardo
and
Alex Mallen
19 May 2026 21:53 UTC
25
points
0
comments
10
min read
LW
link
Let’s have more partial insiders.
Cleo Nardo
19 May 2026 7:24 UTC
15
points
0
comments
2
min read
LW
link
Outsiders should focus on specs/constitutions (among other things)
Cleo Nardo
19 May 2026 1:04 UTC
4
points
5
comments
2
min read
LW
link
How do intentional secret loyalties differ from other schemer motivations?
Cleo Nardo
26 Apr 2026 20:03 UTC
25
points
1
comment
12
min read
LW
link
9 kinds of hard-to-verify tasks
Cleo Nardo
20 Apr 2026 14:43 UTC
60
points
0
comments
3
min read
LW
link
Automating philosophy if Timothy Williamson is correct
Cleo Nardo
20 Apr 2026 13:34 UTC
54
points
19
comments
2
min read
LW
link
Positive-sum interactions between players with linear utility in resources
Cleo Nardo
20 Mar 2026 0:42 UTC
12
points
0
comments
2
min read
LW
link
Sacred values of future AIs
Cleo Nardo
4 Mar 2026 7:47 UTC
58
points
4
comments
5
min read
LW
link
Ensuring Safety in Mixed Deployment
Cleo Nardo
26 Feb 2026 2:15 UTC
22
points
0
comments
5
min read
LW
link
Introspective RSI vs Extrospective RSI
Cleo Nardo
11 Feb 2026 11:54 UTC
10
points
6
comments
2
min read
LW
link
Focusing on Flourishing Even When Survival is Unlikely (Part I)
Cleo Nardo
17 Jan 2026 18:47 UTC
24
points
3
comments
4
min read
LW
link
North Sentinelese Post-Singularity
Cleo Nardo
11 Dec 2025 14:57 UTC
78
points
40
comments
1
min read
LW
link
Strategy-Stealing Argument Against AI Dealmaking
Cleo Nardo
1 Nov 2025 4:39 UTC
17
points
3
comments
2
min read
LW
link
A Very Simple Model of AI Dealmaking
Cleo Nardo
29 Oct 2025 0:33 UTC
18
points
0
comments
9
min read
LW
link
Stratified Utopia
Cleo Nardo
21 Oct 2025 19:09 UTC
86
points
8
comments
11
min read
LW
link
The Case for Mixed Deployment
Cleo Nardo
11 Sep 2025 6:14 UTC
50
points
4
comments
4
min read
LW
link
Gradient routing is better than pretraining filtering
Cleo Nardo
2 Sep 2025 9:05 UTC
51
points
3
comments
5
min read
LW
link
Here’s 18 Applications of Deception Probes
Cleo Nardo
,
Avi Parrack
and
jordinne
28 Aug 2025 18:59 UTC
45
points
0
comments
22
min read
LW
link
Looking for feature absorption automatically
Theodore Ehrenborg
,
Logan Riggs
and
Cleo Nardo
12 Aug 2025 20:46 UTC
16
points
0
comments
6
min read
LW
link
Trusted monitoring, but with deception probes.
Avi Parrack
,
StefanHex
and
Cleo Nardo
23 Jul 2025 5:26 UTC
31
points
0
comments
4
min read
LW
link
(arxiv.org)
Back to top
Next