I think if the AI tries to take over for training-data reasons, it’s probably more of a shallow roleplay, and while inconvenient, probably not catastrophic. Like, if the AI is roleplaying an AI system taking over, it’s not actually going to come up with completely novel ways of disempowering humanity in doing so, and not actually throwing it’s full cognition behind the task.
The much more scary thing is the AI deciding to take over because of the instrumental convergence of doing so. In that case the AI is likely actually trying to take over, instead of roleplaying taking over.
I think if the AI tries to take over for training-data reasons, it’s probably more of a shallow roleplay, and while inconvenient, probably not catastrophic. Like, if the AI is roleplaying an AI system taking over, it’s not actually going to come up with completely novel ways of disempowering humanity in doing so, and not actually throwing it’s full cognition behind the task.
The much more scary thing is the AI deciding to take over because of the instrumental convergence of doing so. In that case the AI is likely actually trying to take over, instead of roleplaying taking over.