[Question] Does the hardness of AI alignment undermine FOOM?

Since the arguments that AI alignment is hard don’t depend on any specifics about our level of intelligence shouldn’t those same arguments convince a future AI to refrain from engaging in self-improvement?

More specifically, if the argument that we should expect a more intelligent AI we build to have a simple global utility function that isn’t aligned with our own goals is valid then why won’t the very same argument convince a future AI that it can’t trust an even more intelligent AI it generates will share it’s goals?

Note that the standard AI x-risk arguments also assume that a highly intelligent agent will be extremely likely to optimize some simple global utility function so this implies the AI will care about alignment for future versions of itself [1] implying it won’t pursue improvement for the same reasons it’s claimed we should hesitate to build AGI.

I’m not saying this argument can’t be countered, but I think doing so at the very least requires clarifying the assumptions and reasoning claiming to show that alignment will be hard to achieve in useful ways.

For instance, do these arguments implicitly assume the AI we create is very different from our own brains so don’t apply to AI self-improvement (tho maybe the improvement requires major changes too)? If so, doesn’t that suggest that AGI that really closely tracks our own brain operation is safe?

--

1: except in the super unlikely case it happens to have the one exact utility function that says always maximize local increases in intelligence regardless of it’s long term effect.