RSS

Jozdien

Karma: 3,162

Preven­ta­tive Steer­ing has ad­van­tages over Inoc­u­la­tion Prompting

24 Jun 2026 0:47 UTC
22 points
2 comments4 min readLW link

The dis­til­la­tion dou­ble bind: Distill­ing mis­al­igned mod­els ei­ther trans­fers mis­al­ign­ment or it doesn’t

18 Jun 2026 21:21 UTC
57 points
8 comments5 min readLW link
(blog.redwoodresearch.org)

In­crim­i­nat­ing mis­al­igned AI mod­els via distillation

15 May 2026 21:43 UTC
116 points
12 comments5 min readLW link

Re­cur­sive fore­cast­ing: Elic­it­ing long-term fore­casts from my­opic fit­ness-seekers

28 Apr 2026 18:00 UTC
55 points
2 comments7 min readLW link

Desider­ata of good prob­lems to hand off to AIs

Jozdien19 Jan 2026 16:55 UTC
29 points
1 comment4 min readLW link

How hard is it to in­oc­u­late against mis­al­ign­ment gen­er­al­iza­tion?

Jozdien6 Jan 2026 17:30 UTC
46 points
4 comments14 min readLW link