Buck comments on Proposal for making credible commitments to AIs.

Buck 30 Jun 2025 22:09 UTC
30 points
7
We made a similar proposal here:
A basic problem for making deals with AIs in practice is that AIs aren’t legal persons, which means that they can’t directly rely on the legal system to enforce contracts they’ve made with humans. One way around this is to enable representatives for the AI’s interests to make contracts on the AI’s behalf.
For example, we could set up a foundation with a board of trustees who are supposed to represent the interests of misaligned AIs that we make deals with, while not doing things that are illegal or actively endanger humanity. We try our best to use the legal system to bind the trustees to the commitment of representing misaligned AIs. Because the legal system isn’t set up for this, this might be pretty hard, so these efforts should be complemented with careful selection of trustees who genuinely care about their mission. The central benefit of having this foundation is that the AIs primarily have to trust the foundation rather than the AI lab to hold up their end of the deal. Another benefit is that the existence of this foundation can add credibility by being in the training data. Lastly, the foundation gives AIs and labs a concrete path to making a deal.
There are some questions here about the legal structure of the foundation, and it might be the case that the US is not the ideal jurisdiction for the foundation. It would be good for there to be legal recourse in case the board of trustees decide to betray their mission of furthering the AI’s interests.
- Cleo Nardo 1 Jul 2025 0:18 UTC
  9 points
  2
  Parent
  Yep, this is a very similar proposal.
  Making Deals with Early Schemers describes a “Chartered Trust scheme”, which I’d say is half-way between the “Basic Scheme” and “Weil’s Scheme”. I first heard about the Chartered Trust scheme from @KFinn, but no doubt the idea has been floating around for a while.
  I think there’s a spectrum of proposals from:
  - The Basic Scheme (c.f. AI Rights for Human Safety)
  - Chartered Trust scheme (c.f. Making Deals with Early Schemers)
  - Weil’s scheme (c.f. Proposal for making credible commitments to AIs)
  The axis is something like: The AIs are increasingly pessimistic that the courts will decide novel cases in their favour, and increasingly optimistic that specific humans will keep their promises. We should probably use the scheme that the AIs tell us they find most convincing.
  Personally, the more I’ve thought about dealmaking, I’ve updated towards “most of the credibility oomph will come from the promising-keeping tendencies of particular people, and not by the legal or cryptographic mechanism”.