I do see a post with a formal result, which seems like a direct contradiction to what you’re saying, though I’ll look in more detail.
If you mean to suggest this post has a positive result, then I think you’re just mis-reading it; the key result is
The conclusion of this post is the following: if there exists some set of natural tasks for which the fastest way to solve them is to do some sort of machine learning to find a good policy, and there is some task for which that machine learning results in deceptive behavior, then there exists a natural task such that the minimal circuit that solves that task also produces deceptive behavior.
which says that under some assumptions, there exists a task for which the minimal circuit will engage in deceptive behavior (IE is a malign inner optimizer).
The comment with a counterexample on the original post is here.
If you mean to suggest this post has a positive result, then I think you’re just mis-reading it; the key result is
which says that under some assumptions, there exists a task for which the minimal circuit will engage in deceptive behavior (IE is a malign inner optimizer).
The comment with a counterexample on the original post is here.
I see, I definitely didn’t read that closely enough.