Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Cleo Nardo
Karma:
3,738
DMs open.
All
Posts
Comments
New
Top
Old
Page
1
North Sentinelese Post-Singularity
Cleo Nardo
11 Dec 2025 14:57 UTC
66
points
37
comments
1
min read
LW
link
Strategy-Stealing Argument Against AI Dealmaking
Cleo Nardo
1 Nov 2025 4:39 UTC
16
points
3
comments
2
min read
LW
link
A Very Simple Model of AI Dealmaking
Cleo Nardo
29 Oct 2025 0:33 UTC
18
points
0
comments
9
min read
LW
link
Stratified Utopia
Cleo Nardo
21 Oct 2025 19:09 UTC
73
points
8
comments
11
min read
LW
link
The Case for Mixed Deployment
Cleo Nardo
11 Sep 2025 6:14 UTC
43
points
4
comments
4
min read
LW
link
Gradient routing is better than pretraining filtering
Cleo Nardo
2 Sep 2025 9:05 UTC
46
points
3
comments
5
min read
LW
link
Here’s 18 Applications of Deception Probes
Cleo Nardo
,
Avi Parrack
and
jordine
28 Aug 2025 18:59 UTC
45
points
0
comments
22
min read
LW
link
Looking for feature absorption automatically
Theodore Ehrenborg
,
Logan Riggs
and
Cleo Nardo
12 Aug 2025 20:46 UTC
16
points
0
comments
6
min read
LW
link
Trusted monitoring, but with deception probes.
Avi Parrack
,
StefanHex
and
Cleo Nardo
23 Jul 2025 5:26 UTC
31
points
0
comments
4
min read
LW
link
(arxiv.org)
Proposal for making credible commitments to AIs.
Cleo Nardo
27 Jun 2025 19:43 UTC
107
points
45
comments
2
min read
LW
link
Can SAE steering reveal sandbagging?
jordine
,
Hoang Khiem
,
Felix Hofstätter
and
Cleo Nardo
15 Apr 2025 12:33 UTC
35
points
3
comments
4
min read
LW
link
Rethinking Laplace’s Rule of Succession
Cleo Nardo
22 Nov 2024 18:46 UTC
13
points
5
comments
2
min read
LW
link
Appraising aggregativism and utilitarianism
Cleo Nardo
21 Jun 2024 23:10 UTC
27
points
10
comments
19
min read
LW
link
Aggregative principles approximate utilitarian principles
Cleo Nardo
12 Jun 2024 16:27 UTC
28
points
3
comments
23
min read
LW
link
Aggregative Principles of Social Justice
Cleo Nardo
5 Jun 2024 13:44 UTC
29
points
10
comments
37
min read
LW
link
Shortform
Cleo Nardo
1 Mar 2024 18:20 UTC
5
points
214
comments
1
min read
LW
link
Uncertainty in all its flavours
Cleo Nardo
9 Jan 2024 16:21 UTC
34
points
6
comments
35
min read
LW
link
Game Theory without Argmax [Part 2]
Cleo Nardo
11 Nov 2023 16:02 UTC
31
points
14
comments
13
min read
LW
link
Game Theory without Argmax [Part 1]
Cleo Nardo
11 Nov 2023 15:59 UTC
70
points
18
comments
19
min read
LW
link
MetaAI: less is less for alignment.
Cleo Nardo
13 Jun 2023 14:08 UTC
71
points
17
comments
5
min read
LW
link
Back to top
Next