Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Cleo Nardo
Karma:
3,012
DMs open.
All
Posts
Comments
New
Top
Old
Page
1
Here’s 18 Applications of Deception Probes
Cleo Nardo
28 Aug 2025 18:59 UTC
31
points
0
comments
22
min read
LW
link
Looking for feature absorption automatically
Theodore Ehrenborg
,
Logan Riggs
and
Cleo Nardo
12 Aug 2025 20:46 UTC
16
points
0
comments
6
min read
LW
link
Trusted monitoring, but with deception probes.
Avi Parrack
,
StefanHex
and
Cleo Nardo
23 Jul 2025 5:26 UTC
31
points
0
comments
4
min read
LW
link
(arxiv.org)
Proposal for making credible commitments to AIs.
Cleo Nardo
27 Jun 2025 19:43 UTC
105
points
45
comments
2
min read
LW
link
Can SAE steering reveal sandbagging?
jordine
,
Hoang Khiem
,
Felix Hofstätter
and
Cleo Nardo
15 Apr 2025 12:33 UTC
35
points
3
comments
4
min read
LW
link
Rethinking Laplace’s Rule of Succession
Cleo Nardo
22 Nov 2024 18:46 UTC
11
points
5
comments
2
min read
LW
link
Appraising aggregativism and utilitarianism
Cleo Nardo
21 Jun 2024 23:10 UTC
27
points
10
comments
19
min read
LW
link
Aggregative principles approximate utilitarian principles
Cleo Nardo
12 Jun 2024 16:27 UTC
28
points
3
comments
23
min read
LW
link
Aggregative Principles of Social Justice
Cleo Nardo
5 Jun 2024 13:44 UTC
29
points
10
comments
37
min read
LW
link
Shortform
Cleo Nardo
1 Mar 2024 18:20 UTC
5
points
116
comments
1
min read
LW
link
Uncertainty in all its flavours
Cleo Nardo
9 Jan 2024 16:21 UTC
34
points
6
comments
35
min read
LW
link
Game Theory without Argmax [Part 2]
Cleo Nardo
11 Nov 2023 16:02 UTC
31
points
14
comments
13
min read
LW
link
Game Theory without Argmax [Part 1]
Cleo Nardo
11 Nov 2023 15:59 UTC
70
points
18
comments
19
min read
LW
link
MetaAI: less is less for alignment.
Cleo Nardo
13 Jun 2023 14:08 UTC
71
points
17
comments
5
min read
LW
link
Rishi Sunak mentions “existential threats” in talk with OpenAI, DeepMind, Anthropic CEOs
Arjun Panickssery
,
Baldassare Castiglione
and
Cleo Nardo
24 May 2023 21:06 UTC
34
points
1
comment
1
min read
LW
link
(www.gov.uk)
List of requests for an AI slowdown/halt.
Cleo Nardo
14 Apr 2023 23:55 UTC
46
points
6
comments
1
min read
LW
link
Excessive AI growth-rate yields little socio-economic benefit.
Cleo Nardo
4 Apr 2023 19:13 UTC
27
points
22
comments
4
min read
LW
link
AI Summer Harvest
Cleo Nardo
4 Apr 2023 3:35 UTC
130
points
10
comments
1
min read
LW
link
The 0.2 OOMs/year target
Cleo Nardo
30 Mar 2023 18:15 UTC
84
points
24
comments
5
min read
LW
link
Wittgenstein and ML — parameters vs architecture
Cleo Nardo
24 Mar 2023 4:54 UTC
44
points
9
comments
5
min read
LW
link
Back to top
Next