How do I know I’m not simulated by the AI to determine my reactions to different escape attempts? How much computing power does it have? Do I have access to its internals?
The situation seems somewhat underspecified to give a definite answer, but given the stakes I’d err on the side of terminating the AI with extreme prejudice. Bonus points if I can figure out a safe way to retain information on its goals so I can make sure the future contains as little utility for it as feasible.
The utility-minimizing part may be an overreaction but it does give me an idea: Maybe we should also cooperate with an unfriendly AI to such an extent that it’s better for it to negotiate instead of escaping and taking over the universe.
As I understand Eliezer’s position, when babyeater-humans say “right”, they actually mean babyeating. They’d need a word like “babysaving” to refer to what’s right.
Morality is what we call the output of a particular algorithm instantiated in human brains. If we instantiated a different algorithm, we’d have a word for its output instead.
I think Eliezer sees translating babyeater word for babyeating as “right” as an error similar to translating their word for babyeaters as “human”.