I can see an argument for “outer alignment is also important, e.g. to avoid failure via sycophancy++”, but this doesn’t seem to disagree with this post? (I understand the post to argue what you should do about scheming, rather than whether scheming is the focus.)
Having good outer alignment incidentally prevents a lot of scheming. But the reverse isn’t nearly as true.
I don’t understand why this is true (I don’t claim the reverse is true either). I don’t expect a great deal of correlation / implication here.
The second thing impacts the first thing :) If a lot of scheming is due to poor reward structure, and we should work on better reward structure, then we should work on scheming prevention.
I can see an argument for “outer alignment is also important, e.g. to avoid failure via sycophancy++”, but this doesn’t seem to disagree with this post? (I understand the post to argue what you should do about scheming, rather than whether scheming is the focus.)
I don’t understand why this is true (I don’t claim the reverse is true either). I don’t expect a great deal of correlation / implication here.
The second thing impacts the first thing :) If a lot of scheming is due to poor reward structure, and we should work on better reward structure, then we should work on scheming prevention.