Rohin Shah comments on Alignment Newsletter #41

Rohin Shah 22 Jan 2019 2:12 UTC
LW: 4 AF: 2
0
AF
But, maybe what you are saying is that in “the intersection of what the user expects and what the user wants”, the first is functioning as a constraint, and the second is functioning as a motivation system (basically the usual IRL motivation system).
This is basically what I meant. Thanks for clarifying that you meant something else.
The most obvious problem I see with this approach is that it seems to imply that the AI can’t help the human do anything which the human doesn’t already know how to do.
Yeah, this is my concern with the thing you actually meant. (It’s also why I incorrectly assumed that “what the user wants” was meant to be goal-directed optimization, as opposed to about policies the user approves of.) It could work combined with something like amplification where you get to assume that the overseer is smarter than the agent, but then it’s not clear if the part about “what the user expects” buys you anything over the “what the user wants” part.
A third interpretation of your concern is that you’re saying that if the thing is doing well enough to get groceries, there has to be powerful optimization somewhere, and wherever it is, it’s going to be pushing toward perverse instantiations one way or another. I don’t have any argument against this concern, but I think it mostly amounts to a concern about inner optimizers.
This does seem like a concern, but it wasn’t the one I was thinking about. It also seems like a concern about basically any existing proposal. Usually when talking about concerns I don’t bring up the ones that are always concerns, unless someone explicitly claims that their solution obviates that concern.