A basic problem for making deals with AIs in practice is that AIs aren’t legal persons, which means that they can’t directly rely on the legal system to enforce contracts they’ve made with humans.
Their paper argues that AI agents should be explicitly designed to follow human laws, not just guided by human values… But, knowing that this “compliance by design” argument is very likely to fail, it proposes treating AIs as legal actors: entities that can bear legal duties, without requiring legal personhood, to ensure accountability even if alignment fails.
Treating AI agents as legal actors without legal personhood would allow them to enter into legally binding contracts and be held accountable for breach of contract, which would eliminate the “AI trustee” requirement.
However, the paper doesn’t define what “consequences” actually mean here (shutdown, API isolation, asset seizure? how do we ensure that the AIs cannot bypass said consequences?). Still, it’s an alternative to proxy representation, and potentially cleaner from a governance perspective.
In the follow-up piece I’m writing, I suggest that the real research bifurcation is this:
Path A – Compliance by Design: Basically, this is the same outer/inner alignment problem we’re already dealing with, getting the system to actually follow legal constraints as part of its optimization behavior.
Path B – Accountability Mechanisms for AIs as Legal Actors: This assumes compliance will fail in many cases, and the path I encourage more research on. So we design control mechanisms and legal procedures that allow us to hold misaligned systems accountable without collapsing responsibility onto humans unfairly. This fits more with AI Control.
Questions I’d love your take on
Assuming AIs could directly enter legally binding agreements (i.e., legal actorship were granted), how would that change your approach to the “Setting up infrastructure to pay the AIs” section? What are the legal considerations you think we’d have to keep in mind to enable this infrastructure?
What does enforcement look like in your view if an AI breaches the deal? Given what we know about deception, reward hacking, and failure modes in auditability: what are the actual penalties or mechanisms that would incentivize AI agents to “be scared” of the legal consequences or even take them seriously?
Thank you in advance! I really think that discussing these questions from both a legal and technical perspective aids the objective of shifting the Overton window.
I came accross this post at the right time! I am writing a follow up on this paper by the Institute of Law& AI, “Law-Following AI: designing AI agents to obey human laws” (Cullen O’Keefe, Christoph Winters).
Their paper argues that AI agents should be explicitly designed to follow human laws, not just guided by human values… But, knowing that this “compliance by design” argument is very likely to fail, it proposes treating AIs as legal actors: entities that can bear legal duties, without requiring legal personhood, to ensure accountability even if alignment fails.
Treating AI agents as legal actors without legal personhood would allow them to enter into legally binding contracts and be held accountable for breach of contract, which would eliminate the “AI trustee” requirement.
However, the paper doesn’t define what “consequences” actually mean here (shutdown, API isolation, asset seizure? how do we ensure that the AIs cannot bypass said consequences?). Still, it’s an alternative to proxy representation, and potentially cleaner from a governance perspective.
In the follow-up piece I’m writing, I suggest that the real research bifurcation is this:
Path A – Compliance by Design:
Basically, this is the same outer/inner alignment problem we’re already dealing with, getting the system to actually follow legal constraints as part of its optimization behavior.
Path B – Accountability Mechanisms for AIs as Legal Actors:
This assumes compliance will fail in many cases, and the path I encourage more research on. So we design control mechanisms and legal procedures that allow us to hold misaligned systems accountable without collapsing responsibility onto humans unfairly. This fits more with AI Control.
Questions I’d love your take on
Assuming AIs could directly enter legally binding agreements (i.e., legal actorship were granted), how would that change your approach to the “Setting up infrastructure to pay the AIs” section? What are the legal considerations you think we’d have to keep in mind to enable this infrastructure?
What does enforcement look like in your view if an AI breaches the deal?
Given what we know about deception, reward hacking, and failure modes in auditability: what are the actual penalties or mechanisms that would incentivize AI agents to “be scared” of the legal consequences or even take them seriously?
Thank you in advance! I really think that discussing these questions from both a legal and technical perspective aids the objective of shifting the Overton window.