RSS

Cleo Nardo

Karma: 3,012

DMs open.

Here’s 18 Ap­pli­ca­tions of De­cep­tion Probes

Cleo Nardo28 Aug 2025 18:59 UTC
31 points
0 comments22 min readLW link

Look­ing for fea­ture ab­sorp­tion automatically

12 Aug 2025 20:46 UTC
16 points
0 comments6 min readLW link

Trusted mon­i­tor­ing, but with de­cep­tion probes.

23 Jul 2025 5:26 UTC
31 points
0 comments4 min readLW link
(arxiv.org)

Pro­posal for mak­ing cred­ible com­mit­ments to AIs.

Cleo Nardo27 Jun 2025 19:43 UTC
105 points
45 comments2 min readLW link

Can SAE steer­ing re­veal sand­bag­ging?

15 Apr 2025 12:33 UTC
35 points
3 comments4 min readLW link

Re­think­ing Laplace’s Rule of Succession

Cleo Nardo22 Nov 2024 18:46 UTC
11 points
5 comments2 min readLW link

Ap­prais­ing ag­grega­tivism and utilitarianism

Cleo Nardo21 Jun 2024 23:10 UTC
27 points
10 comments19 min readLW link

Ag­grega­tive prin­ci­ples ap­prox­i­mate util­i­tar­ian principles

Cleo Nardo12 Jun 2024 16:27 UTC
28 points
3 comments23 min readLW link

Ag­grega­tive Prin­ci­ples of So­cial Justice

Cleo Nardo5 Jun 2024 13:44 UTC
29 points
10 comments37 min readLW link

Shortform

Cleo Nardo1 Mar 2024 18:20 UTC
5 points
116 comments1 min readLW link

Uncer­tainty in all its flavours

Cleo Nardo9 Jan 2024 16:21 UTC
34 points
6 comments35 min readLW link

Game The­ory with­out Argmax [Part 2]

Cleo Nardo11 Nov 2023 16:02 UTC
31 points
14 comments13 min readLW link

Game The­ory with­out Argmax [Part 1]

Cleo Nardo11 Nov 2023 15:59 UTC
70 points
18 comments19 min readLW link

Me­taAI: less is less for al­ign­ment.

Cleo Nardo13 Jun 2023 14:08 UTC
71 points
17 comments5 min readLW link

Rishi Su­nak men­tions “ex­is­ten­tial threats” in talk with OpenAI, Deep­Mind, An­thropic CEOs

24 May 2023 21:06 UTC
34 points
1 comment1 min readLW link
(www.gov.uk)

List of re­quests for an AI slow­down/​halt.

Cleo Nardo14 Apr 2023 23:55 UTC
46 points
6 comments1 min readLW link

Ex­ces­sive AI growth-rate yields lit­tle so­cio-eco­nomic benefit.

Cleo Nardo4 Apr 2023 19:13 UTC
27 points
22 comments4 min readLW link

AI Sum­mer Harvest

Cleo Nardo4 Apr 2023 3:35 UTC
130 points
10 comments1 min readLW link

The 0.2 OOMs/​year target

Cleo Nardo30 Mar 2023 18:15 UTC
84 points
24 comments5 min readLW link

Wittgen­stein and ML — pa­ram­e­ters vs architecture

Cleo Nardo24 Mar 2023 4:54 UTC
44 points
9 comments5 min readLW link