[Question] Collection of arguments to expect (outer and inner) alignment failure?

Various arguments have been made for why advanced AI systems will plausibly not have the goals their operators intended them to have (due to either outer or inner alignment failure).

I would really like a distilled collection of the strongest arguments.

Does anyone know if this has been done?

If not, I might try to make it. So, any replies pointing me to resources with arguments that I’ve missed (in my own answer) would also be much appreciated!

Clarification: I’m most interested in arguments that alignment failure is plausible, rather than merely that it is possible (there are already examples that establish the possibility of outer and inner alignment failure for current ML systems, which probably implies we can’t rule it out for more advanced versions of these systems either).

No comments.