Rohin Shah comments on AI Boxing for Hardware-bound agents (aka the China alignment problem)

Rohin Shah 9 May 2020 22:47 UTC
2 points
0
What do you predict will happen then?
I don’t know; I do expect the line to slow down, though I’m not sure when. (See e.g. here and here for other people’s analysis of this point.)
Interested in the answer to this, and how much it looks like/disagrees with my proposal
It’s of a different type signature than your proposal. I agree that “how should infrastructure and institutions be changed” is an important question; it’s just not what I focus on. I think that there is still a technical question that needs to be answered: how do you build AI systems that do what you want them to do?
In particular, nearly all AI algorithms that have ever been developed assume a known goal / specification, and then figure out how to achieve that goal. If this were to continue all the way till superintelligent AI systems, I’d be very worried, because of convergent instrumental subgoals. I don’t think this will continue all the way to superintelligent AI systems, but that’s because I expect people (including myself) to figure out how to build AI systems in a different way so that they optimize for our goals instead of their own goals.
Of course one way to do this would be to encode a perfect representation of human values into the system, but like you I think this is unlikely to work (see also Chapter 1 of the Value Learning sequence). I usually think of the goal as “figure out how to build an AI system that is trying to help us”, where part of helpful behavior is clarifying our preferences / values with us, ensuring that we have accurate information, etc. (See Clarifying AI Alignment and my comment on it.) Think of this as like trying to figure out how to embed the skills of a great personal assistant into an AI system.