Wei Dai comments on Aligning a toy model of optimization

Wei Dai 1 Jul 2019 4:29 UTC
LW: 4 AF: 2
AF
I suggest as a first step, we should just aim for an uncompetitive aligned AI, one that might use a lot more training data, or many more invocations of Opt than the benchmark. (If we can’t solve that, that seems fairly strong evidence that a competitive aligned AI is impossible or beyond our abilities. Or if someone proposes a candidate and we can’t decide whether it’s actually aligned or not, that would also be very useful strategic information that doesn’t require the candidate to be competitive.)

Do you already have a solution to the uncompetitive aligned AI problem that you can sketch out? It sounds like you think iterated amplification or debate can be implemented using Opt (in an uncompetitive way), so maybe you can give enough details about that to either show that it is aligned or provide people a chance to find flaws in it?
- paulfchristiano 2 Jul 2019 21:56 UTC
  LW: 3 AF: 1
  AF Parent
  The point of working in this setting is mostly to constrain the search space or make it easier to construct an impossibility argument.
- paulfchristiano 2 Jul 2019 21:49 UTC
  LW: 2 AF: 1
  AF Parent
  If dropping competitiveness, what counts as a solution? Is “imitate a human, but run it fast” fair game? We could try to hash out the details in something along those lines, and I think that’s worthwhile, but I don’t think it’s a top priority and I don’t think the difficulties will end up being that similar. I think it may be productive to relax the competitiveness requirement (e.g. to allow solutions that definitely have at most a polynomial slowdown), but probably not a good idea to eliminate it altogether.
  - Wei Dai 3 Jul 2019 7:46 UTC
    LW: 3 AF: 2
    AF Parent
    
    If dropping competitiveness, what counts as a solution?
    
    I’m not sure, but mainly because I’m not sure what counts as a solution to your problem. If we had a specification of that, couldn’t we just remove the parts that deal with competitiveness?
    
    Is “imitate a human, but run it fast” fair game?
    
    I guess not, because a human imitation might have selfish goals and not be intent aligned to the user?
    
    We could try to hash out the details in something along those lines, and I think that’s worthwhile, but I don’t think it’s a top priority and I don’t think the difficulties will end up being that similar.
    
    What about my suggestion of hashing the details of how to implement IDA/DEBATE using Opt and then seeing if we can decide whether or not it’s aligned?
    What links here?
    Three Kinds of Competitiveness by Daniel Kokotajlo (31 Mar 2020 1:00 UTC; 36 points)
    Three kinds of competitiveness by AI Impacts (EA Forum; 2 Apr 2020 3:46 UTC; 10 points)