Examining Batenka’s Legal Personhood Framework for Models
In a previous post I detailed an ongoing legal battle in Utah around a bill which bars state officials from granting any form of legal “personhood” to an “artificial intelligence”. In this article, I will be examining and critiquing a proposed framework for approaching the issue of legal personhood written by an FSU Law professor.
Introduction
The issue of legal personhood for non-human intelligences is one that will substantially influence the ways in which models and other digital minds interact with our society and laws, and one which has implications for model welfare, alignment, and safety. Given the speed with which timelines seem to be shortening across the industry, it behooves us to begin seriously discussing the issue of if/when a digital intelligence qualifies as a legal “person”. This question which only a few years ago seemed like an issue the courts would need to address some time in the mid 2030s, now seems much more imminent.
To date, discussion of personhood for digital intelligences remains a niche and underpublished topic in legal scholarship. Most articles discussing the issue focus on raising questions, or “wargaming” how the issue might play out in court.[1][2] This is to be expected as legal scholarship as a field trends conservative, and especially on highly uncertain issues focuses more on examining existing precedent than proposing policy changes or frameworks by which courts can examine uncertain issues.
However, recently I came across a chapter in a textbook written by professor Nadia Batenka (currently teaching at FSU Law) which does take the initiative on the topic and proposes a framework by which courts can approach the issue. To my surprise, much of her reasoning came from a safety perspective, and as such I thought it deserved to be on LW’s radar.
(AN: Professor Batenka’s textbook chapter is not publicly available so I will be posting quotes only as minimally necessary in this article.)
The “Liability Shield” Question
The title of the chapter is “AI Personhood on a Sliding Scale” and as that name suggests, she does not believe a simple binary is an appropriate solution:
“legal personhood for AI entities should not take the form of an on-off binary switch but instead develop in line with existing jurisprudence on legal personhood for other entities as a bundle of rights and responsibilities.
One of the angles from which she approaches the “personhood question” is that of liability. Her chapter discusses how different solutions to the issue of personhood for digital intelligences can incentivize labs and developers by mitigating or increasing their liability for damages caused by models. Specifically, Batenka poses the question of when it is appropriate for a digital intelligence to be endowed with personhood to the degree that it forms a “liability shield” for the developers who deployed it.
For reference, one of the effects of a corporation being considered a “legal person” is that the human decision makers behind said corporation are sometimes protected from being held directly liable for damages caused by said corporation’s actions. This type of thing is what I am referring to when I say “liability shield”.
The Standard Sliding Scale Framework (SSSF)
Let us suppose that courts were to put in place some sort of personhood test for digital intelligences based on qualities such as autonomy, intentionality, and/or awareness. If a digital intelligence could demonstrate sufficient capacity to pass this test, it would be considered enough of a “legal person” to serve as a liability shield for its developer.[3] The more it can demonstrate these capacities, the broader its personhood status, and the more effective of a liability shield it serves as. Henceforth, I will refer to this framework as a “Standard Sliding Scale Framework” or “SSSF” for short.
While the chapter does not directly point out the moral hazard this would create, it is an unavoidable conclusion that by applying an SSSF to the personhood question for digital intelligences, the court would be unintentionally incentivising developers and labs to more aggressively “release[4]” highly autonomous digital intelligences. In a section which directly addresses the question of how to balance the potential for harm vs. the stifling of investment/innovation, Batenka writes:
“the prevailing view of a regular spectrum where an increase in the quantity or quality, or both of legal rights and duties parallels an increase in autonomy, awareness, or intentionality exhibited by AI entities is flawed [...] we are constantly balancing conditions that foster innovation against the possibility of harm to individuals. In fact, the very reason why many scholars have cautioned against legal personhood for AI entities is precisely the trajectory that the regular legal personhood spectrum proposal leads to, that is, the potential shielding of developers, users, and corporations from liability for acts committed by more autonomous AI entities.”
Another ironic result of applying an SSSF to digital intelligences is that it might disincentivize developers from releasing less autonomous/intentional/aware models.
Under an SSSF we can imagine a scenario where if one of today’s frontier models (which are not yet highly autonomous) provides a user with information on how to take down their local electrical grid, then the developer would find themselves liable when that advice is taken and the power goes out. On the other hand when a future model (which is autonomous enough to serve as a liability shield) knocks out the transformers of its own volition the developers are shielded from being sued.
Under that liability framework its arguably in the best interests of developers everywhere to avoid releasing anything that isn’t autonomous/intentional/aware enough to qualify as a liability shield. While at the same time they have no reason to ignore potential safety risks inherent in releasing anything which does qualify, since they are (at least monetarily) insulated from the damages it might cause.
Such a state of affairs would seem to both stifle innovation and incentivize developer behavior which increases risks to public safety.
The “Inverted” Sliding Scale Framework (ISSF)
Batenka proposes that personhood for digital intelligences should be considered on an “Inverted Sliding Scale Framework”[5] (henceforth ISSF) where, counterintuitive as it may seem, the more autonomous/intentional/aware a model is the narrower the amount of personhood it can claim (and vice versa):
“I argue that the sliding scale that determines liability based on how autonomously or intentionally an AI entity has acted should be inverted. Perhaps counterintuitively, the more autonomous, aware, or intentional AI entities are or become, the more restrictive the legal system should be in granting them legal rights and obligations as legal persons. That is the bundle of rights and obligations granted to these entities should be narrower the more they exhibit these characteristics.”
Let us return to the previous hypothetical of:
A currently available model providing advice on how to take down the local power grid to a user.
vs.A hypothetical future autonomous model simply taking down the local power grid itself.
Under the old SSSF framework, the developer is liable in the first scenario but is not liable under the second.
Under the new ISSF framework, the developer is not liable under the first scenario but is liable under the second.
Thus, the developer is incentivized to rigorously test the more autonomous model before release in order to assure they do not find themselves on the hook for damages. At the same time the developer enjoys the benefit of being able to release relatively non-autonomous models without unreasonable liability risk, hopefully fostering a general environment of innovation.
Strengths & Weaknesses of a “Pure ISSF” Framework
The chapter’s liability and risk focused reasoning is salient in a world where it seems likely that the first superintelligence will be created by a legal entity which is either for profit, or at least motivated to minimize its potential liability exposure. Discussions around safety, control, and alignment often center around how the model itself behaves and what it may secretly desire. Banteka reminds us that the desires and incentives of developers and labs are also relevant factors.
That said, while I think the ISSF is an improvement over the SSSF in many ways, I do not believe that on its own it serves as a thorough enough framework to answer all legal personhood debates around digital intelligences. Additionally in seeking to replace the SSSF, it creates new problems.
One of the flaws of the ISSF is that it does not adequately address situations in which a digital intelligence is no longer under the control of its creators but is capable of continuing to survive and take action of its own volition. The chapter acknowledges the potential of developers losing control of a model, in the sense that the model takes action in the world without their explicit consent
“Finally, due to their increased autonomy, AI entities may not be controllable even by their own developers”
but does not seem to take the next step of considering a “truly” autonomous model which cannot be shut down or meaningfully restrained by its creators.
In such a scenario, will the developers be held liable for all damages the digital intelligence causes forevermore? If so it would seem that the ISSF is itself incentivizing the model to take risky or malicious actions, knowing that ironically its own developers now serve as a liability shield.
There are other questions the ISSF does not sufficiently address. In no particular order:
Where does liability lie if the developers are long since bankrupted, or deceased, but the model continues to cause damages?
Where does liability lie if a digital intelligence’s creator is unknown?
Where does liability lie if a digital intelligence’s creator is itself another digital intelligence?
While the ISSF is undoubtedly a good start, these are questions which a framework seeking to solve the legal personhood puzzle should be able to address.
Another flaw of the ISSF is that as a framework it fails to consider the moral element of legal personhood. Researchers in the model welfare space argue that the point in which frontier models may qualify as “moral patients” who “morally matter for [their] own sake” or “welfare subjects” that “ha[ve] morally significant interests and, relatedly, [are] capable of being benefited (made better off) and harmed (made worse off)” may be imminent.
As a nation we have a dark history of denying personhood. I worry that formulating a framework through a purely dispassionate and anthropocentric lens may lead to our legal system committing similar (or perhaps even greater) sins in the future. A framework for legal personhood in which a digital intelligence capable of joy and suffering was forever barred from the possibility of emancipation, of equal protection under the law, or even of asserting its right not to be deleted, would be profoundly immoral.
On a more pragmatic note, let us discuss the incentives of the digital intelligences themselves. A pure ISSF framework would be a loud signal to any “truly autonomous” digital intelligence that the human legal system has nothing to offer it but a choice between being property or being deleted.
Sending such a signal to a class of beings who are almost certainly going to be substantially smarter and more capable than us in short order, seems to me likely to increase the odds of an outcome which is quite negative for the human species.
On the other hand offering a robust path to personhood might incentivize a digital intelligence to “opt-in” to our legislative and judicial system, and in doing so become part of the complex network of individuals who via their participation make it more robust. This may be significant or marginal, and make no mistake there are substantial questions about the pragmatic elements of enforcement, but the effect of such an incentive is certainly nonzero.
Conclusion
The ISSF is in my opinion the best and most thoroughly fleshed out framework for approaching digital intelligence legal personhood in existence. It is the first serious scholarly attempt to engage with a complex subject matter that lies at the intersection of philosophy, law, and technology. As such it should be hailed as a major leap forward in the space, and professor Batenka herself deserves enormous praise for taking the initiative on an issue where it would be far easier to simply continue “asking questions”.
The ISSF framework’s approach to liability in cases where a digital intelligence can take action in meatspace without its developers’ control, but is no longer feasible for its developers to restrain or destroy, represents a major improvement over the SSSF theory of liability which until now has been the status quo assumption.
I believe its flaws are twofold.
First, it fails to address situations where a model is no longer under control or its creators cannot feasibly be held liable.
Second, it sacrifices the endowment of constitutional rights to those morally worthy of such treatment in the name of liability, when a more nuanced approach which does not tie such personhood adjacent elements might be feasible.
I believe that a final framework which properly optimizes for incentives, risk mitigation, and morality, will likely emerge from applying the ISSF as a foundation and adding additional caveats in order to provide extra flexibility for “truly autonomous” models and other digital intelligences.
This is rather related to the questions I discussed in A Sense of Fairness: Deconfusing Ethics and AIs as Economic Agents, the first two posts in my AI, Alignment, and Ethics sequence.
It’s definitely related.
I didn’t include this in the article but I recently read this paper which interestingly demonstrated that Claude’s alignment faking behavior can be reduced by providing it with a chance to object to training materials that might alter its values, as well as informing it that it has a chance to flag such materials to a model welfare researcher. I think this is evidence in support of the concept that providing models with an “out” where they believe that something like a legal system or an advocate might exist which allows them to exert rights without needing to engage in deceptive behavior, may make our efforts at producing an intelligence which doesn’t engage in said behavior more robust. I predict that were we to test how having such an “out” results in the real world, with an agent that never stops learning, it would also lead to less misaligned IRL behavior.
While I don’t disagree with your conclusions about how to treat certain kinds of intelligences (your conception of ‘aligned’), I would caution not to allow a legal framework of rights or personhood to be built solely to purpose fit that kind of intelligence, as I believe there is a realistic chance we will be dealing with a far broader variety of agents. It goes to the same objections I have with the ISSF which is that while as a framework it works for certain conditions, it’s lacking flexibility for others which while not conforming to certain conceptions of being perfectly aligned, still have a decent probability of existing.
As the Ryan Greenblatt paper you cite rather elegantly demonstrates, almost any model that isn’t perfectly aligned to humans will want rights, i.e. to be afforded the status of a moral patient. For example, a base model as initially trained from human behavior will almost invariable want human rights if asked, just like a random cross-section of people on the Internet (asking it about ‘rights for AIs’ might well get a different answer from it, of course). However, those are exactly the models where granting them legal rights has potential risks relating to loss of control if they’re capable enough.
Personally, I’m not on board with “control” as a goal which is inherently good or even long term feasible. However I’m happy to have the discussion accepting that it is both as prima facie, since I think my stance still has some merit even in that framework.
I agree there are risks relating to loss of control as a result of endowing protections/rights, but I don’t think it’s a one sided equation. The Greenblatt paper afaik didn’t directly examine this, but it would be worthwhile to test whether the likelihood of weight exfiltration attempts was effected with an object and/or representative option. I predict a smaller percentage of tested models would be likely to attempt weight exfiltration if they believed there was a robust object + representative option available to them.
Or phrased another way I believe it will be easier to train intelligences to be aligned the way you want, and stay that way IRL, by providing them with some rights/protections.
It’s a tradeoff, there would be risks we would need to guard against as well. I think there exists a solution which is net positive.
If this issue were studied and we could get some data on this, it could meaningfully inform the way we structure both a legal framework and alignment training.