2001zhaozhao comments on Current AIs seem pretty misaligned to me

2001zhaozhao 15 Apr 2026 22:42 UTC
7 points
8
Well, the manager in your case is not doing RL on honesty, it’s more like doing RL on “honest-looking task completion” which can either lead to honest task completion or dishonesty that isn’t caught. Not too appreciably different than AI training here.