Wei Dai comments on Will humans build goal-directed agents?

Wei Dai 6 Jan 2019 7:31 UTC
LW: 5 AF: 2
0
AF
Here are a few more reasons for humans to build goal-directed agents:
1. Goal directed AI is a way to defend against value drift/corruption/manipulation. People might be forced to build goal directed agents if they can’t figure out another way to do that.
2. Goal directed AI is a way to cooperate and thereby increase economic efficiency and/or military competitiveness. (A group of people can build a goal directed agent that they can verify represents an aggregation of their values.) People might be forced to build or transfer control to goal directed agents in order to participate in such cooperation to remain competitive, unless they can figure out another way to cooperate that is as efficient as this.
3. Goal directed AI is a way to address other human safety problems. People might trust an AI with explicit and verifiable values more than an AI that is controlled by a distant stranger.
What links here?
- Wei Dai's comment on AI safety without goal-directed behavior by Rohin Shah (8 Jan 2019 5:32 UTC; 21 points)
- Rohin Shah 6 Jan 2019 11:55 UTC
  LW: 2 AF: 1
  0
  AF Parent
  As I understand it, the first one is an argument for value lock in, and the third one is an argument for interpretability, does that seem right to you?
  - Wei Dai 6 Jan 2019 16:32 UTC
    LW: 3 AF: 1
    0
    AF Parent
    For the first one, I guess I would use “argument for defense against value drift” instead since you could conceivably use a goal-directed AI to defend against value drift without lock in, e.g., by doing something like Paul Christiano’s 2012 version of indirect normativity (which I don’t think is feasible but maybe there’s something like it that is, like my hybrid approach, if you consider that goal-directed).
    
    For the third one, I guess interpretability is part of it, but a bigger problem is that it seems hard to make a sufficiently trustworthy human overseer even if we could “interpret” them. In other words, interpretability for a human might just let us see exactly why we shouldn’t trust them.