RSS

Cleo Nardo

Karma: 4,696

DMs open.

Which goals ac­tu­ally mo­ti­vate de­cep­tive al­ign­ment?

19 May 2026 21:53 UTC
25 points
0 comments10 min readLW link

Let’s have more par­tial in­sid­ers.

Cleo Nardo19 May 2026 7:24 UTC
15 points
0 comments2 min readLW link

Out­siders should fo­cus on specs/​con­sti­tu­tions (among other things)

Cleo Nardo19 May 2026 1:04 UTC
4 points
5 comments2 min readLW link

How do in­ten­tional se­cret loy­alties differ from other schemer mo­ti­va­tions?

Cleo Nardo26 Apr 2026 20:03 UTC
25 points
1 comment12 min readLW link

9 kinds of hard-to-ver­ify tasks

Cleo Nardo20 Apr 2026 14:43 UTC
60 points
0 comments3 min readLW link

Au­tomat­ing philos­o­phy if Ti­mothy Willi­am­son is correct

Cleo Nardo20 Apr 2026 13:34 UTC
54 points
19 comments2 min readLW link

Pos­i­tive-sum in­ter­ac­tions be­tween play­ers with lin­ear util­ity in resources

Cleo Nardo20 Mar 2026 0:42 UTC
12 points
0 comments2 min readLW link

Sa­cred val­ues of fu­ture AIs

Cleo Nardo4 Mar 2026 7:47 UTC
58 points
4 comments5 min readLW link

En­sur­ing Safety in Mixed Deployment

Cleo Nardo26 Feb 2026 2:15 UTC
22 points
0 comments5 min readLW link

In­tro­spec­tive RSI vs Ex­tro­spec­tive RSI

Cleo Nardo11 Feb 2026 11:54 UTC
10 points
6 comments2 min readLW link

Fo­cus­ing on Flour­ish­ing Even When Sur­vival is Un­likely (Part I)

Cleo Nardo17 Jan 2026 18:47 UTC
24 points
3 comments4 min readLW link

North Sen­tine­lese Post-Singularity

Cleo Nardo11 Dec 2025 14:57 UTC
78 points
40 comments1 min readLW link

Strat­egy-Steal­ing Ar­gu­ment Against AI Dealmaking

Cleo Nardo1 Nov 2025 4:39 UTC
17 points
3 comments2 min readLW link

A Very Sim­ple Model of AI Dealmaking

Cleo Nardo29 Oct 2025 0:33 UTC
18 points
0 comments9 min readLW link

Strat­ified Utopia

Cleo Nardo21 Oct 2025 19:09 UTC
86 points
8 comments11 min readLW link

The Case for Mixed Deployment

Cleo Nardo11 Sep 2025 6:14 UTC
50 points
4 comments4 min readLW link

Gra­di­ent rout­ing is bet­ter than pre­train­ing filtering

Cleo Nardo2 Sep 2025 9:05 UTC
51 points
3 comments5 min readLW link

Here’s 18 Ap­pli­ca­tions of De­cep­tion Probes

28 Aug 2025 18:59 UTC
45 points
0 comments22 min readLW link

Look­ing for fea­ture ab­sorp­tion automatically

12 Aug 2025 20:46 UTC
16 points
0 comments6 min readLW link

Trusted mon­i­tor­ing, but with de­cep­tion probes.

23 Jul 2025 5:26 UTC
31 points
0 comments4 min readLW link
(arxiv.org)