Cleo Nardo

Karma: 4,696

DMs open.

Which goals actually motivate deceptive alignment?

Cleo Nardo and Alex Mallen

19 May 2026 21:53 UTC

25 points

0 comments10 min readLW link

Let’s have more partial insiders.

Cleo Nardo19 May 2026 7:24 UTC

15 points

0 comments2 min readLW link

Outsiders should focus on specs/constitutions (among other things)

Cleo Nardo19 May 2026 1:04 UTC

4 points

5 comments2 min readLW link

How do intentional secret loyalties differ from other schemer motivations?

Cleo Nardo26 Apr 2026 20:03 UTC

25 points

1 comment12 min readLW link

9 kinds of hard-to-verify tasks

Cleo Nardo20 Apr 2026 14:43 UTC

60 points

0 comments3 min readLW link

Automating philosophy if Timothy Williamson is correct

Cleo Nardo20 Apr 2026 13:34 UTC

54 points

19 comments2 min readLW link

Positive-sum interactions between players with linear utility in resources

Cleo Nardo20 Mar 2026 0:42 UTC

12 points

0 comments2 min readLW link

Sacred values of future AIs

Cleo Nardo4 Mar 2026 7:47 UTC

58 points

4 comments5 min readLW link

Ensuring Safety in Mixed Deployment

Cleo Nardo26 Feb 2026 2:15 UTC

22 points

0 comments5 min readLW link

Introspective RSI vs Extrospective RSI

Cleo Nardo11 Feb 2026 11:54 UTC

10 points

6 comments2 min readLW link

Focusing on Flourishing Even When Survival is Unlikely (Part I)

Cleo Nardo17 Jan 2026 18:47 UTC

24 points

3 comments4 min readLW link

North Sentinelese Post-Singularity

Cleo Nardo11 Dec 2025 14:57 UTC

78 points

40 comments1 min readLW link

Strategy-Stealing Argument Against AI Dealmaking

Cleo Nardo1 Nov 2025 4:39 UTC

17 points

3 comments2 min readLW link

A Very Simple Model of AI Dealmaking

Cleo Nardo29 Oct 2025 0:33 UTC

18 points

0 comments9 min readLW link

Stratified Utopia

Cleo Nardo21 Oct 2025 19:09 UTC

86 points

8 comments11 min readLW link

The Case for Mixed Deployment

Cleo Nardo11 Sep 2025 6:14 UTC

50 points

4 comments4 min readLW link

Gradient routing is better than pretraining filtering

Cleo Nardo2 Sep 2025 9:05 UTC

51 points

3 comments5 min readLW link

Here’s 18 Applications of Deception Probes

Cleo Nardo, Avi Parrack and jordinne

28 Aug 2025 18:59 UTC

45 points

0 comments22 min readLW link

Looking for feature absorption automatically

Theodore Ehrenborg, Logan Riggs and Cleo Nardo

12 Aug 2025 20:46 UTC

16 points

0 comments6 min readLW link

Trusted monitoring, but with deception probes.

Avi Parrack, StefanHex and Cleo Nardo

23 Jul 2025 5:26 UTC

31 points

0 comments4 min readLW link

(arxiv.org)