[Question] Are there any impossibility theorems for strong and safe AI?

Are there any impossibility theorems for the existence of AI that is both strong and safe? I think such theorems would be interesting because they could help to evaluate proposals for safe AI: we could ask “which assumption does this proposal break?”

I have a vague sense that a theorem of this sort might be able to be developed along the following lines:
1. The kind of strong AI that we want is a technological tool such that it’s easy to tell it what to do, and it can successfully do a wide variety of complex things when told
2. Simple instructions + complex results → AI has a lot of flexibility in its action
3. There are only a few ways to reliably achieve goals requiring complex behaviour e.g. something approximating expected utility maximisation
4. 2+3 + instrumental convergence → flexibility is likely to be exploited in dangerous ways

Do fleshed out versions of this argument exist? Do you have any other ideas about impossibility theorems?