Towards_Keeperhood comments on Giving AIs safe motivations

Towards_Keeperhood 14 Nov 2025 8:46 UTC
1 point
0
Do you count avoiding reward-on-the-episode-seekers as part of step 2 or step 3?
- Joe Carlsmith 15 Nov 2025 19:07 UTC
  2 points
  0
  Parent
  By default step 3 (reward-on-the-episode seekers aren’t directly optimizing for your future efforts at studying their generalization to fail in the direction of AI takeover), but I do think the line here can get a bit blurry.