In combination with an AI with social skills that are fundamentally stunted in some way, this might actually work. If the AI cannot directly interface with the world in any meaningful way without the key and it doesn’t have any power to persuade a human actor to supply it with the key, it’s pretty much trapped ( unless there is some way for it to break its own encryption ).
Edit: notwithstanding the possibility that some human being may be stupid enough to supply it with the key despite not asking for it.
I think being encrypted may not actually help much with the control problem, since the problem isn’t that we expect an AI to fully understand what we want and then be evil, it’s that we’re worried that an AI will not be optimizing what we want. Not knowing what the outputs actually do doesn’t seem like it would help at all (except that the AI would only have the inputs we want it to have).
That seems to me to be the advantage. If it can’t read the data in its sensors without the key, it can’t know if any given action will increase or decrease its utility.
In combination with an AI with social skills that are fundamentally stunted in some way, this might actually work. If the AI cannot directly interface with the world in any meaningful way without the key and it doesn’t have any power to persuade a human actor to supply it with the key, it’s pretty much trapped ( unless there is some way for it to break its own encryption ).
Edit: notwithstanding the possibility that some human being may be stupid enough to supply it with the key despite not asking for it.
I think being encrypted may not actually help much with the control problem, since the problem isn’t that we expect an AI to fully understand what we want and then be evil, it’s that we’re worried that an AI will not be optimizing what we want. Not knowing what the outputs actually do doesn’t seem like it would help at all (except that the AI would only have the inputs we want it to have).
the AI would only have the inputs we want it to
That seems to me to be the advantage. If it can’t read the data in its sensors without the key, it can’t know if any given action will increase or decrease its utility.