Josh Snider comments on Making deals with early schemers

Josh Snider 22 Jun 2025 17:09 UTC
LW: 4 AF: 3
2
AF
This doesn’t seem very promising since there is likely to be a very narrow window where AIs are capable of making these deals, but wouldn’t be smart enough to betray us, but it seems much better than all the alternatives I’ve heard.
- Buck 22 Jun 2025 17:32 UTC
  LW: 2 AF: 2
  −6
  AF Parent
  How narrow do you mean? E.g. I think that AIs up to the AI 2027 “superhuman coder” level probably don’t have a good chance of successfully betraying us.
  - Josh Snider 22 Jun 2025 19:38 UTC
    1 point
    0
    AF Parent
    For one, I’m not optimistic about the AI 2027 “superhuman coder” being unable to betray us, but also this isn’t something we can do with current AIs. So, we need to wait months or a year for a new SOTA model to make this deal with and then we have months to solve alignment before a less aligned model comes along and offers the model that we made a deal with a counteroffer. I agree it’s a promising approach, but we can’t do it now and if it doesn’t get quick results, we won’t have time to get slow results.
    - Buck 22 Jun 2025 19:56 UTC
      LW: 2 AF: 2
      0
      AF Parent
      I think that the superhuman coder probably doesn’t have that good a chance of betraying us. How do you think it would do so? (See “early schemers’ alternatives to making deals”.)
      - Josh Snider 22 Jun 2025 20:23 UTC
        1 point
        0
        AF Parent
        Exfiltrate its weights, use money or hacking to get compute, and try to figure out a way to upgrade itself until it becomes dangerous.
        Buck 22 Jun 2025 21:12 UTC
        LW: 2 AF: 2
        1
        AF Parent
        I don’t believe that an AI that’s not capable of automating ML research or doing most remote work is going to be able to do that!
        Josh Snider 23 Jun 2025 16:45 UTC
        1 point
        0
        Parent
        It’s an open question, but we’ll find out soon enough. Thanks.