By default step 3 (reward-on-the-episode seekers aren’t directly optimizing for your future efforts at studying their generalization to fail in the direction of AI takeover), but I do think the line here can get a bit blurry.
By default step 3 (reward-on-the-episode seekers aren’t directly optimizing for your future efforts at studying their generalization to fail in the direction of AI takeover), but I do think the line here can get a bit blurry.