The tricky question is Goodhart, and here this is a point where I disagree with Charlie Steiner. I do think humans are at least Boltzmann rational in all non-political areas, and I think this is wide enough for capabilities that this could be done (though that kills any efforts against misuse.) I also think that sandboxing AI such that it has zero probability of discovering politics is actually possible.
Short form, I am much more optimistic about human rationality in all non-political areas than Charlie Steiner, and I think sandboxing is possible.
This certainly hurts capabilities, especially for social capabilities like LLMs, which is a big problem. However, RLHF might prevent this from becoming a big problem.
The tricky question is Goodhart, and here this is a point where I disagree with Charlie Steiner. I do think humans are at least Boltzmann rational in all non-political areas, and I think this is wide enough for capabilities that this could be done (though that kills any efforts against misuse.) I also think that sandboxing AI such that it has zero probability of discovering politics is actually possible.
Short form, I am much more optimistic about human rationality in all non-political areas than Charlie Steiner, and I think sandboxing is possible.
This certainly hurts capabilities, especially for social capabilities like LLMs, which is a big problem. However, RLHF might prevent this from becoming a big problem.