RSS

Cleo Nardo

Karma: 3,738

DMs open.

North Sen­tine­lese Post-Singularity

Cleo Nardo11 Dec 2025 14:57 UTC
66 points
37 comments1 min readLW link

Strat­egy-Steal­ing Ar­gu­ment Against AI Dealmaking

Cleo Nardo1 Nov 2025 4:39 UTC
16 points
3 comments2 min readLW link

A Very Sim­ple Model of AI Dealmaking

Cleo Nardo29 Oct 2025 0:33 UTC
18 points
0 comments9 min readLW link

Strat­ified Utopia

Cleo Nardo21 Oct 2025 19:09 UTC
73 points
8 comments11 min readLW link

The Case for Mixed Deployment

Cleo Nardo11 Sep 2025 6:14 UTC
43 points
4 comments4 min readLW link

Gra­di­ent rout­ing is bet­ter than pre­train­ing filtering

Cleo Nardo2 Sep 2025 9:05 UTC
46 points
3 comments5 min readLW link

Here’s 18 Ap­pli­ca­tions of De­cep­tion Probes

28 Aug 2025 18:59 UTC
45 points
0 comments22 min readLW link

Look­ing for fea­ture ab­sorp­tion automatically

12 Aug 2025 20:46 UTC
16 points
0 comments6 min readLW link

Trusted mon­i­tor­ing, but with de­cep­tion probes.

23 Jul 2025 5:26 UTC
31 points
0 comments4 min readLW link
(arxiv.org)

Pro­posal for mak­ing cred­ible com­mit­ments to AIs.

Cleo Nardo27 Jun 2025 19:43 UTC
107 points
45 comments2 min readLW link

Can SAE steer­ing re­veal sand­bag­ging?

15 Apr 2025 12:33 UTC
35 points
3 comments4 min readLW link

Re­think­ing Laplace’s Rule of Succession

Cleo Nardo22 Nov 2024 18:46 UTC
13 points
5 comments2 min readLW link

Ap­prais­ing ag­grega­tivism and utilitarianism

Cleo Nardo21 Jun 2024 23:10 UTC
27 points
10 comments19 min readLW link

Ag­grega­tive prin­ci­ples ap­prox­i­mate util­i­tar­ian principles

Cleo Nardo12 Jun 2024 16:27 UTC
28 points
3 comments23 min readLW link

Ag­grega­tive Prin­ci­ples of So­cial Justice

Cleo Nardo5 Jun 2024 13:44 UTC
29 points
10 comments37 min readLW link

Shortform

Cleo Nardo1 Mar 2024 18:20 UTC
5 points
214 comments1 min readLW link

Uncer­tainty in all its flavours

Cleo Nardo9 Jan 2024 16:21 UTC
34 points
6 comments35 min readLW link

Game The­ory with­out Argmax [Part 2]

Cleo Nardo11 Nov 2023 16:02 UTC
31 points
14 comments13 min readLW link

Game The­ory with­out Argmax [Part 1]

Cleo Nardo11 Nov 2023 15:59 UTC
70 points
18 comments19 min readLW link

Me­taAI: less is less for al­ign­ment.

Cleo Nardo13 Jun 2023 14:08 UTC
71 points
17 comments5 min readLW link