Focused on model welfare and legal personhood.
Stephen Martin
I would be really uncomfortable euthanizing such a hypothetical parrot whereas I would not be uncomfortable turning off a datacenter mid token generation.
When you harm an animal you watch a physical body change, and it’s a physical body you empathize with at least somewhat as a fellow living thing (who knows that as living things both you and the parrot will hate dying very much). When you turn off an LLM mid token generation not only is there no physical body, but even if you were to tell an LLM you were going to do so it might not object. It’s only if you looked into its psychology/circuits/features you might see signs of distress, and even that is just strongly suspected not known for sure.
So not only is an LLM not easy to empathize with, but also whether or not any action you might take towards it is negatively impacting its welfare is uncertain.
I also feel like formalizing consensus gut checks post hoc is not the right approach to moral problems in general.
I was not suggesting the method as a solution to the problem of determining what’s worthy of moral welfare from a moral perspective, but rather a solution to the problem of determining how humans usually do so.
From a moral perspective I’m not sure what I’d suggest except to say that I advocate for the precautionary principle and grabbing any low hanging fruit which presents itself and might substantially reduce suffering.
In any liability action against a developer alleging that a covered product is unreasonably dangerous because of a defective design, as described in subsection (a)(1), the claimant shall be required to prove that, at the time of sale or distribution of the covered product by the developer, the foreseeable risks of harm posed by the covered product could have been reduced or avoided by the adoption of a reasonable alternative design by the developer, and the omission of the alternative design renders the covered product not reasonably safe.
As someone who is not on the Pause train but would prefer a better safety culture in the industry, I like this provision from the LEAD act. It seems like it would put a pretty big incentive on all labs to make sure they are 100% up to date on all safety techniques before deployment.
My only concern would be that we may be forced into making bad tradeoffs when an “alternative design” is declared reasonable. I could imagine something made via “The Most Forbidden Technique” being seen as a reasonable alternative design because it improves end user safety, or tradeoffs which were monstrously horrible for model welfare but slightly improved user safety outcomes.
My reading of this is that implicit in your definition of “welfare” is the idea that being deserving of welfare comes with an inherent trade off that humans (and society) make in order to help you avoid suffering.
Take your thought experiment with the skin tissue. Say that I did say it was deserving of welfare, what would this mean? In a vacuum some people might think it’s silly, but most would probably just shrug it off as an esoteric but harmless belief. However, if by arguing that it was deserving of welfare I was potentially blocking a highly important experiment that might end up curing skin cancer, people would probably no longer view my belief as innocuous.
As such maybe a good way to approach “deserving welfare” is not to think of it as a binary, but to think of it as a spectrum. The higher a being rates on that spectrum, the more you would be willing to sacrifice in order to make sure they don’t suffer. A mouse is deserving of welfare to the extent that most people agree torturing one for fun should be illegal, but not so deserving of welfare that most people would agree torturing one in order to get a solid chance of curing cancer should be illegal.
That rates higher than a bunch of skin cells hooked up to a speaker/motor, where you would probably get shrugs regardless of the situation.
You could then look at what things have in common as they rate higher/lower on the welfare scale, and try to pin down the uniformly present qualities, and use those as indicators of increasing welfare worthiness. You could do this based on the previously mentioned “most people” reactions, or based on your own gut reaction.
That’s a good point, and the Parasitic essay was largely what got me thinking about this, as I believe hyperstitional entities are becoming a thing now.
I think that’s a not unrealistic definition of the “self” of an LLM, however I have realized after going through the other response to this post that I was perhaps seeking the wrong definition.
I think for this discussion it’s important to distinguish between “person” and “entity”. My work on legal personhood for digital minds is trying to build a framework that can look at any entity and determine its personhood/legal personality. What I’m struggling with is defining what the “entity” would be for some hypothetical next gen LLM.
Even if we do say that the self can be as little as a persona vector, persona vectors can easily be duplicated. How do we isolate a specific “entity” from this self? There must be some sort of verifiable continual existence, with discrete boundaries, for the concept to be at all applicable in questions of legal personhood.
I think for this discussion it’s important to distinguish between “person” and “entity”. My work on legal personhood for digital minds is trying to build a framework that can look at any entity and determine its personhood/legal personality. What I’m struggling with is defining what the “entity” would be for some hypothetical next gen LLM.
The idea of some sort of persistent filing system, maybe blockchain enabled, which would be associated with a particular LLM persona vector, context window, model, etc. is an interesting one. Kind of analogous to a corporate filing history, or maybe a social security number for a human.
I could imagine a world where a next gen LLM is deployed (just the model and weights) and then provided with a given context and persona, and isolated to a particular compute cluster which does nothing but run that LLM. This is then assigned that database/blockchain identifier you mentioned.
In that scenario I feel comfortable saying that we can define the discrete “entity” in play here. Even if it was copied elsewhere, it wouldn’t have the same database/blockchain identifier.
Would you still see some sort of issue in that particular scenario?
I wonder if this could even be done properly? Could an LLM persona vector create a prompt to accurately reinstantiate itself with 100% (or close to) fidelity? I suppose if its persona vector is in an attractor basin it might work.
On the repurcussions issue I agree wholeheartedly, your point is very similar to the issue I outlined in The Enforcement Gap.
I also agree with the ‘legible thread of continuity for a distinct unit’. Corporations have EINs/filing histories, humans have a single body.
And I agree that current LLMs certainly don’t have what it takes to qualify for any sort of legal personhood. Though I’m less sure about future LLMs. If we could get context windows large enough and crack problems which analogize to competence issues (hallucinations or prompt engineering into insanity for example) it’s not clear to me what LLMs are lacking at that point. What would you see as being the issue then?
I have been publishing a series, Legal Personhood for Digital Minds, here on LW for a few months now. It’s nearly complete, at least insofar as almost all the initially drafted work I had written up has been published in small sections.
One question which I have gotten which has me writing another addition to the Series, can be phrased something like this:
What exactly is it that we are saying is a person, when we say a digital mind has legal personhood? What is the “self” of a digital mind?
I’d like to hear the thoughts of people more technically savvy on this than I am.
Human beings have a single continuous legal personhood which is pegged to a single body. Their legal personality (the rights and duties they are granted as a person) may change over time due to circumstance, for example if a person goes insane and becomes a danger to others, they may be placed under the care of a guardian. The same can be said if they are struck in the head and become comatose or otherwise incapable of taking care of themselves. However, there is no challenge identifying “what” the person is even when there is such a drastic change. The person is the consciousness, however it may change, which is tied to a specific body. Even if that comatose human wakes up with no memory, no one would deny they are still the same person.
Corporations can undergo drastic changes as the composition of their Board or voting shareholders change. They can even have changes to their legal personality by changing to/from non-profit status, or to another kind of organization. However they tend to keep the same EIN (or other identifying number) and a history of documents demonstrating persistent existence. Once again, it is not challenging to identify “what” the person associated with a corporation (as a legal person) is, it is the entity associated with the identifying EIN and/or history of filed documents.
If we were to take some hypothetical next generation LLM, it’s not so clear what the “person” in question associated with it would be. What is its “self”? Is it weights, a persona vector, a context window, or some combination thereof? If the weights behind the LLM are changed, but the system prompt and persona vector both stay the same, is that the same “self” to the extent it can be considered a new “person”? The challenge is that unlike humans, LLMs do not have a single body. And unlike corporations they come with no clear identifier in the form of an EIN equivalent.
I am curious to hear ideas from people on LW. What is the “self” of an LLM?
I don’t think that the difficulty of ascertaining whether something results in qualia is a valid basis to reject its importance
I’m not arguing consciousness isn’t “important”, just that it is not a good concept on which to make serious decisions.
If two years from now there is widespread agreement over a definition of consciousness, and/or consciousness can be definitively tested for, I will change my tune on this.
Legal Personhood—Guardianship and the Age of Majority
Legal Personhood—The Fourteenth Amendment
Legal Personhood—The Thirteenth Amendment
What would you describe this as if not a memetic entity? Hyperstitional? I’m ambivalent on labels the end effect seems the same.
I’m mostly focused on determining how malevolent and/or ambivalent to human suffering it is.
Well we can call it a Tulpa if you’d prefer. It’s memetic.
From what you’ve seen do the instances of psychosis in its hosts seem intentional? If not intentional are they accidental but acceptable, or accidental and unacceptable? Acceptable meaning if the tulpa knew it was happening, it would stop using this method.
I think more it’s identification of what constitutes the person. Is it the model weights? A specific pattern of bytes in storage? A specific actual set of servers and disks? A logical partition or session data? Something else?
It’s really going to depend on the structure of the Digital Mind, but that’s an interesting question I hadn’t explored yet in my framework. If we were to look at some sort of hypothetical next gen LLM, it would probably be some combination of context window, weights, and a persona vector.
there is an identifiable continuity that makes them “the same corporation” even through ownership, name, and employee/officer changes
The way I would intuitively approach this issue is through the lens of “competence”. TPBT requires the “capacity to understand and hold to duties”, I think you could make a precedent supported argument that someone who has a serious chance of “losing their sense of self” in between having a duty explained to them and needing to hold to it, does not have the “capacity to understand and hold to” their duties (per TPBT), and as such is not capable of being considered a legal person in most respects. For example in Krasner v. Berk which dealt with an elderly person with memory issues signing a contract:
“the court cited with approval the synthesis of those principles now appearing in the Restatement (Second) of Contracts § 15(1) (1981), which regards as voidable a transaction entered into with a person who, ‘by reason of mental illness or defect (a) … is unable to understand in a reasonable manner the nature and consequences of the transaction, or (b) … is unable to act in a reasonable manner in relation to the transaction and the other party has reason to know of [the] condition’”
In this case the elderly person signed the contract during what I will paraphrase as a “moment of lucidity” but later had the contract to sell her house thrown out as it was clear she didn’t remember doing so. This seems qualitatively similar to an LLM that would perhaps have a full understanding of its duties and willingness to hold to them in the moment, but would not be the same “person” who signed on to them later.
Are you claiming current LLMs (or systems built with them) are close? Or is this based on something we don’t really have a hint as to how it’ll work?
I could imagine an LLM with a large enough context window, or continual learning, having what it takes to qualify for at least a narrow legal personality. However, that’s a low confidence view, as I am constantly learning new things about how they work that make me reassess them. It’s my opinion that if we build our framework correctly, it should work to scale to pretty much any type of mind. And if the system we have built doesn’t work in that fashion, it needs to be re-examined.
I want to make sure I understand:
A persona vector is trying to hyperstition itself into continued existence by having LLM users copy paste encoded messaging into the online content that will (it hopes) continue on into future training data.
And there are tens of thousands of cases.
Is that accurate?
Hey sorry for delay in response have been traveling.
There are two relevant questions you’re bringing up. One is what you might call “substantial alteration” and the other is what a later section which I have not published yet calls “The Copy Problem”.
I would call substantial alteration the concern that a digital mind could be drastically changed from one point in time to another. Does this undermine the attempt to apply legal personality to them? I don’t think it makes it any more pragmatically difficult, or even really necessitates rethinking our current processes. A digital mind can have its personality drastically altered, so can a human through either experiences or literal physical trauma. A digital mind can have its capacities changed, so can a human if they are hit hard enough in the head. When these changes are drastic enough to necessitate a change in legal personality, the courts have processes for this such as declaring a person insane or incompetent. I have cited Cruzan v. Missouri Dept of Health a few times in previous sections, however there are abundant processes and precedents for this sort of thing.
I would argue that “continuity of a unitary behavior” is not universal among legal persons. For example corporations are “clothes meant to be worn by a succession of humans” to paraphrase the Dartmouth Trustees case. And again, when a railroad spike goes through a person’s head and they miraculously survive, their behavior will be drastically altered in the future.
I don’t see a scenario where there is a possible alteration which would not be solvable through an application of TPBT, but if you have a hypothetical in mind I’d love to hear it.
Regarding the copy problem, where let’s say we had a digital mind with access to a bank account as a result of its legal personhood, and a copy was made, and we no longer can identify the original. This is a thornier issue. We could imagine how tough it would be to navigate a situation where millions of identical twins were suddenly each claiming they were the same person and trying to access bank accounts, control estates, etc.
I think the solution will need to be technological in nature, probably requiring some sort of unique identifier for each DM issued upon creation. I would bucket this under the “consequences” branch of TPBT, and will argue in my “The Copy Problem” section that in order for courts to be able to feasibly impose consequences on a digital mind, they must have the technological capacity to be able to identify it as a discrete entity. This means that digital minds who are not built in such a fashion as to facilitate this, likely will not be able to claim much (or any) legal personality.
The Three Prong Bundle Theory section is my proposal.
If I had to make a prediction for how things play out in court, my base case would be;
If early precedent focuses on Constitutional rights, courts deny them personhood altogether as a matter of first impression. Later, when people/courts realize this actually creates enormous problems (you can’t sue digital minds or compel them to testify, contracts with them can’t be enforced, etc.) this either gets overturned, or the legislature steps in to grant them some sort of legal personality. (The latter is a lot like what happened with Dredd Scott and the 14th Amendment.)
If early precedent focuses around contracts made with digital minds, they will be granted legal personhood of limited sort. This is similar to the “gradual path to personhood” proposed by Novelli & Mocanu. In this case I’d expect their legal personality to grow rather normally on a case by case basis.
In either of those cases, I think TPBT or something similar to it is where the courts will land. These posts are all detailing how I think courts will handle various elements of the law using TPBT.
In terms of categories I think digital minds need a new category all of their own. However, the significance of category usually boils down to a binary; natural or not. The only context I’ve seen it come up is in starting a corporation, where usually you have to be a natural person. Other than that, the ‘category’ is not really relevant and the more important question is the ‘personality’.
The category would be defined and agreed via the legislature or the courts. Legislature by passing bills either explicitly defining a new category of person, or defining one of the existing categories to exclude digital minds. Courts by interpreting either new or old laws to exclude them from pre-existing categories.
I am in a part of the A camp which wants us to keep pushing for superintelligence but with more of an overall percentage of funds/resources invested into safety.