Joe Carlsmith comments on Giving AIs safe motivations

Joe Carlsmith 15 Nov 2025 19:07 UTC
2 points
0
By default step 3 (reward-on-the-episode seekers aren’t directly optimizing for your future efforts at studying their generalization to fail in the direction of AI takeover), but I do think the line here can get a bit blurry.