Katalina Hernandez comments on Proposal for making credible commitments to AIs.

Katalina Hernandez 1 Jul 2025 12:15 UTC
3 points
1
This was a very useful read. I’ve been writing a response article to this paper by the Institute of Law& AI, “Law-Following AI: designing AI agents to obey human laws” (Cullen O’Keefe, Christoph Winters). But reading this post (and Making deals with early schemers ) made me re-examine a few assumptions. For context- I am a UK&EU lawyer working in AI Governance.
Where Law&AI and your proposal diverge
- Legal actorship vs. trusted proxies.
  Law&AI argue we can (and should) treat advanced systems as legal actors, entities that bear duties and can enter contracts directly, even without full personhood.
  Your scheme keeps humans as the only party with legal actorship. The AI contracts de facto via selected individuals Pi whose rights are enforceable today. The AI’s trust burden shifts from “the court must defend me” to “Pi must honour a non-enforceable promise.”
- Blind spot in Law-&-AI: enforcement is one-sided.
  Law&AI focus on sanctions (shutdown, licence revocation, fines) if the AI breaches its duties. They barely touch the mirror problem: what compels humans to pay up when the AI performs?
  Your proxy model addresses that gap, but at the cost of a huge attack surface such as social engineering Pi, forum shopping between jurisdictions, and what legal framework actually enforces Pi’s fiduciary-like duties to the AI’s interests once the funds are released.
What “legal actorship without legal personhood” looks like
We already let non-persons hold obligations or assets:
- Unincorporated trusts and estates in probate can sue and be sued via appointed representatives.
- Maritime vessels have “in rem” liability: your ship can be arrested even if the owner hides.
- Partnerships in some jurisdictions are sui generis legal actors without separate personhood.
These show we can graft bespoke liability regimes onto novel entities, but only with clear asset pools and enforcement hooks.
Reward hacking and “compliance by design”
Law&AI acknowledge that design-level legal compliance is technically challenging, but they give little attention to advanced AIs’ known tendency towards reward-hacking.
If an AI’s latent objective diverges from legal constraints, it can simply optimize aroud the law (just like corporate legal actors already do).
That said, combining actual incentives for legal compliance (e.g. compensation mechanisms) with the legal actorship without personhood framework could offer a more grounded path forward.
My take:
- Track A – Compliance by design
  Same unsolved outer/inner alignment problem as with any other alignment objective. We still lack a reliable spec for “follow the law” across distribution shifts.
- Track B – Accountability mechanisms
  Borrow from AI control theory: cryptographic logging, hash-chained action histories, sandboxed execution, mandatory third-party audits so that a court (or automated arbiter) can actually detect breach and apply penalties.
Caveat: Absent Track B, giving the AI contract rights looks like another liability shield for the humans who deploy it.
Open questions for both sides
1. Enforcement on humans: In your proxy model, what prevents L from bankrupting Pi or settling quietly? Do we need mandatory insurance or escrow?
2. Enforcement on AIs: If we adopt legal actorship, what is the operative penalty for breach? weight escrow, compute de-allocation, KYC-style revocation of API keys… How do we prevent a power-seeking model from treating those penalties as adversarial constraints to route around?
3. Audit clarity: Could we combine both approaches? AI holds limited actorship, but payouts flow through bonded trustees whose own assets are on the line if they betray either party.
I’d love to know your thoughts! All of this is very useful for my response piece. I appreciate the push!

Katalina Hernandez comments on Proposal for making credible commitments to AIs.

Where Law&AI and your proposal diverge

What “legal actorship without legal personhood” looks like

Reward hacking and “compliance by design”

Open questions for both sides