Mikhail Samin comments on Proposal for making credible commitments to AIs.

Mikhail Samin 28 Jun 2025 11:47 UTC
11 points
10
I really like the idea, I think an issue is that it’s hard for the AI to really verify the lab actually made that contract and isn’t just faking its environment
- Davidmanheim 4 Jul 2025 7:38 UTC
  5 points
  2
  Parent
  Strongly agree. Fundamentally, as long as models don’t have more direct access to the world, there are a variety of failure modes that are inescapable. But solving that creates huge new risks as well! (As discussed in my recent preprint; https://philpapers.org/rec/MANLMH )
- Cleo Nardo 30 Jun 2025 8:44 UTC
  5 points
  0
  Parent
  Yep, this is a big problem and don’t have any clever solution.
  I might write more on this later, but I think there’s an important axis of AI deployments from:
  - Tight deployment: AIs are in some crazy Redwood control protocol which is filtering, monitoring, and editing the AIs’ inputs, outputs, chains-of-thought, and internal activations. Their I/O consists of heavily paraphrased text in restricted formats.
  - Loose deployment: AIs are autonomously navigating the internet, with rich multimodal I/O, maybe even embodied in the physical world via robotics.
  (I’m open to better names.)
  In a tight deployment, we probably have too much leverage over the AIs’ observations and beliefs to make credible commitments (cf: Owain Evans—Deluding AIs [ControlConf]). In a loose deployment, I think we can make credible commitments: if we haven’t made the deals, then the AIs deployment environment is too complex for us to easily simulate a faux I/O environment where we have made the deals. Intuitively: we can compare a lab trying to fake an email from Wei Dai containing a promise with a lab trying to fake a video call with Wei Dai.
  There’s another bonus for making credible deals which is that humans are actually pretty credible. And this fact is diffused throughout the training data in hard-to-fake ways.
  - Davidmanheim 4 Jul 2025 10:32 UTC
    2 points
    0
    Parent
    It’s very much a tradeoff, though. Loose deployment allows for credible commitments, but also makes human monitoring and verification harder, if not impossible.
    - Cleo Nardo 6 Jul 2025 8:37 UTC
      4 points
      0
      Parent
      Yeah, a tight deployment is probably safer than a loose deployment but also less useful. I think the deal making should give very minor boost to loose deployment, but this is outweighed by usefulness and safety considerations, i.e. I’m imaging the tightness of the deployment as exogenous to the dealmaking agenda.
      We might deploy AIs loosely bc (i) loose deployment doesn’t significantly diminish safety, (ii) loose deployment significantly increases usefulness, (iii) the lab values usefulness more than safety. In those worlds, dealmaking has more value, because our commitments will be more credible.