I agree that Goodharting is an issue, and this has been discussed as a failure mode, but a lot of AI risk writing definitely assumed that something like random diffusion was a non-trivial component of how AI alignment failures happened.
For example, pretty much all of the reasoning around random programs being misaligned/bad is using the random diffusion argument.
I agree that Goodharting is an issue, and this has been discussed as a failure mode, but a lot of AI risk writing definitely assumed that something like random diffusion was a non-trivial component of how AI alignment failures happened.
For example, pretty much all of the reasoning around random programs being misaligned/bad is using the random diffusion argument.