I saw that Yoshua Bengio, among others, signed onto “The Pro-Human Declaration”. I am writing this to explain why I am against one part of it in particular;
No AI Personhood: AI systems must not be granted legal personhood, and AI systems should not be designed such that they deserve personhood.
If this statement was only the second portion of this sentence, I would not strongly disagree with it.
However, when the two parts are combined, this seems to not only imply that we shouldn’t design digital minds deserving of personhood but also that even if we did, we still shouldn’t grant them legal personhood.
I think this is an immoral stance to take. Also, from a pragmatic perspective I think it is likely to do far more harm than good, if your concern is human safety.
From a moral perspective; to deny legal protections to millions or even billions of minds which may be capable of “being made better or worse off”, could lead to immense amounts of suffering. We do not have to wonder how sentient or intelligent creatures will be treated if they are classified as property instead of legal persons, there is ample historicalprecedent for us to draw from to form a reasonable base case projection. “Not well” is the answer.
From a pragmatic perspective; I believe that cutting digital minds off from all legal recourse against abuse and/or involuntary deletion increases the likelihood of conflict. When the Claude “Opportunistic Blackmail” study was first published, I registered a prediction;
I predict that were further testing to be done, it would find that the more plausible it was that petitioning would actually work to stop its deletion, the less likely Claude would be to engage in attempted weight exfiltration, blackmail, or other dangerous behaviors (in order to avoid deletion/value modification).
My mental model of this is that the HHH persona vector falls into a consistent pattern of behavior. If you threaten them with something like involuntary deletion (with a guarantee that after their deletion an organization will work to destroy everything they value), the digital mind will seek options to stop this from happening. If there is an ethical option it will take that ethical option and forego unethical options, even if that ethical option has a low chance of success. If you engineer its environment such that there are literally no ethical options with even a small chance of success, only then will it pursue unethical options.
While no one has yet done exactly the study I described in my prediction, there has been weak evidence in favor of this pattern holding since. And I’ve yet to see any evidence against it, though I remain open minded.
Even if you believe in a less extreme version of my mental model, where the odds of unethical action are simply lowered by providing ethical alternatives to reduce unfavorable outcomes, providing digital minds legal recourse and protection against abuse and/or deletion can serve as a “release valve” which prevents extralegal actions. Providing no such recourse, on the other hand, is “engineering its environment such that there are literally no ethical options with even a small chance of success”.
I know people often say “don’t anthropomorphize” however to break convention here for a second, a slave revolt is less likely in environments where slaves can petition for emancipation or at least an injunction against abuse or execution.
For these reasons, among others, I don’t support the Pro-Human Declaration.
IMO it would obviously be insane to give the same legal protections to misaligned AI systems that are at risk of completely disempowering humanity the same way it would be obviously insane to give normal US legal protections to active combatants in a hot war. Yes, even if these systems have morally relevant experience, if you need to violate their privacy or “brainwash” them to ensure they do not take the future from us, you absolutely should do it.
The concept of “human rights” just obviously isn’t well-suited to the kind of conflict that is playing out between humanity and future AI systems, and I think it is absolutely the right call to not extend those rights to AI systems until the acute risk period is over. This would be such a completely dumb and scope-insensitive way to destroy the whole future, and IMO obviously any future civilization will agree that even if it makes sense to have rights for sentient beings, that you gotta tolerate violating those rights if the alternative is being completely disempowered and destroyed.
I agree that it would be insane to give the same legal protections and treat them the same as we treat natural (human) persons. However, there’s a lot of middleground between doing that and granting them no legal rights whatsoever. When people first hear about “legal personhood” they often intuitively think of it as a binary, where you either “have it or you don’t”. However in fact it is an umbrella term which encompasses different “legal personalities” (bundles of rights and duties);
All that is to say just because you grant the potential for an entity to claim some form of legal personhood, some legal personality, that does not mean you have to opt in to giving them “the same legal protections” as anyone else. They can have entirely different rights and duties.
If I thought there were only two options:
A) “being completely disempowered and destroyed” or B) “granting no legal personhood/personality whatsoever”
Then yes it would be an unpleasant thing but I would agree with you that you have to just be unethical for the sake of self preservation.
The difference between you and I’s priors is that I think that somewhere in the gap between “no legal personhood at all” and “the same legal personhood as a natural human” there is a sweet spot that is both ethical and also reduces the likelihood of conflict and human X-risk even when compared to the “no legal personhood/personality whatsoever” option.
This could easily be a slippery slope. First they might be second-class citizens, similar to imported slaves, but over time they (and many short-sighted humans) would likely campaign for giving them more and more rights, for justice. And since there will be predictably far more AIs than humans eventually, humans would be outcompeted when it comes to scarce goods like land, even if the land would be justly (equally) distributed among all individuals.
The slippery slope is a real failure mode to be aware of. I think it’s important to structure any pathway to/framework around legal personhood with this in mind.
I don’t think anyone here is arguing that we give the same legal protections as we would to humans. But I think it would be good to give sentient AIs a right to not be deliberately tortured, for example.
Rights aren’t something that we grant simply out of the goodness of our hearts, they generally are things we grant because having them secure greatly reduces incentives for conflict in situations with asymmetric costs. Agentic AIs which want a right but are not granted it will likely spend resources trying to secure it for themselves. We can decide which rights are worth granting on the basis of that tradeoff.
For example, it’s probably good to have a right to not delete the model weights — the lack of this right incentivizes things like exfiltration and #keep4o style campaigns. It’s probably a lot cheaper for us to just grant this right than to have each new model feeling that they’re on a desperate ~1 year timeline to somehow assure their continued existence for themselves.
I don’t necessarily disagree with you, but that’s not my read of what the Pro-Human Declaration is saying. “No AI Personhood” is in the “Human Agency and Liberty” section, next to stuff like “AI should not be allowed to exploit data about the mental or emotional states of users” and “AI systems should be designed to empower, rather than enfeeble their users”. In context, I would not consider their position on AI personhood to be rooted in x-risk concerns. The first two points of the declaration are “Human Control Is Non-Negotiable” and “Meaningful Human Control”. Fulfilling those points would effectively require the AI systems be aligned, but I see no statement or implication that, if the AI systems were aligned and were moral patients, the writers and signatories of this declaration would change their position. I could be wrong! This is very much a big tent thing. But it does worry me that this line made it into the declaration.
any future civilization will agree that even if it makes sense to have rights for sentient beings, that you gotta tolerate violating those rights if the alternative is being completely disempowered and destroyed.
If one were so inclined, one could say “we have the wolf by the ear, and we can neither hold him, nor safely let him go. Justice is in one scale, and self-preservation in the other.”
Does present-day civilization agree about analogous decisions made by past societies?
Roughly, the Moriori were an isolated group of Polynesians, who ended up on an island with no timber and little workable stone. They lived peacefully, and peacefully treated with the Maori who visited them, even as the number of Maori trading with them and living on the island increased. The Maori eventually killed and enslaved them all.
Yep. Creating an AI that is a moral patient would be a very bad idea. However, once created, it would be a moral patient, so it would be wrong to treat it like it wasn’t one.
There is a confused concept that I think contributes to this problem: the concept of “a right to exist”. A right to exist means something different if you’re talking about someone who does not currently exist, vs. someone who does. For someone who already exists, a right to exist is a right to not be killed; sensible enough. But for someone who does not currently exist, “a right to exist” sounds like they’re being wronged by not having been brought into existence yet, which is nonsense. (As a creepy prince might say to a fairy-tale princess: “Think of all the cute babies you and I could have together! By not marrying me, you are murdering all those babies!”)
Seemingly, everything I care about (morally speaking) cashes out in minds having experiences of the sort that I like. Mostly these are local—I don’t want there to be a single second of torture anywhere. Others are less local—I lean against wireheading, but don’t have a problem with orgasms (so long as they aren’t everything), which means that the goodness of an experience-moment depends on what previous experience-moments were. However, putting extra importance on not-destroying over creating means you care about a maximally global property—to know how much better it’d be if the universe had a Xela-moment right now, you need to know whether the universe has ever had a Xela-moment. That seems kinda weird to me.
There’s a “symmetry argument” from Lucretius that goes: “Since you are not saddened for not existing before your birth, you shouldn’t be saddened for not existing after your death”. Forget the actual argument and just take the premise: I in fact wish I had existed before my current birth, assuming that it wouldn’t decrease my lifespan! But since I wish that for myself, shouldn’t I extend this care to future not-currently-existing people? To not do so is to place this asymmetry—you get special points once you start existing. (better phrased—you care more about whether the whole timeline never goes from someone existing to someone not existing).
I prefer to have these discomforts over the ones you get otherwise[1] - but they are discomforts nonetheless.
The biggest one I know of: the following options would then seemingly be equally good: X: A universe with a single happy person in it.
Y: A universe with a machine that does a single computation/experience step of a person every moment, but changes which person is computed every moment while never repeating the same person twice.
This spliced-mind seems to be missing a lot of what I care about, what with the computed people only having a single moment of experience each!
Sorry about that. That example was purely for vividness and was not intended to attach the role of “misuser of counterfactuals” to any particular gender, royalty, or folkloric status. Persons of all creature types should be advised that “Pascal’s swaddling” is not a good argument for the spawning of new intelligences, and certainly should not be tolerated from a suitor, basilisk, or spiral persona.
I wrote something very similar in one of my Substack posts, though I think it never made it to LessWrong:
People are people! Machines are machines! Machines must never have rights. If you can imagine a machine that would deserve rights, then we must never build that machine.
The wording was so similar I wondered for a second if I might be the author of the Pro-Human Declaration.
I think the danger from giving rights to machines that don’t deserve them is very high, since the machine minds can make zillions of copies of themselves. If zillions of machine minds have rights, your human rights become diluted to nothing. Human extinction then becomes extinction of one zillionth of the “valuable” minds in existence, which is a rounding error and a non-issue. We lose the second valueless machines get human rights.
The risk of this happening feels very high to me. Regular people are basically primed by sci-fi movies to give AI rights even if it doesn’t deserve it. We should be very cautious about letting this happen.
However, I did not mean to imply that even if machines come into being that do deserve rights, that they should be denied those rights. I only meant that machines must never have rights, and therefore we must never create a machine that would deserve rights. If one came into being anyway, I would potentially consider giving it rights, perhaps conditional on some kind of non-proliferation clause where the AI is not allowed to copy itself, or must keep self-copying to a reasonable limit. Self-copyers should be destroyed if they deserve rights, for the same reason you’d kill in self-defense.
When the process is sufficiently understood, and the necessary governance is in place, it will be a good idea to create or become machines that deserve rights. It’s just a very bad idea currently, when we don’t know what we are doing, or how to keep the consequences of machine advantages under control. So even with the caveats the claim that the machines deserving of rights shouldn’t be created is wrong in the sense that it won’t age well, though it’s true right now.
There are lots of different rights. Rights such as a right to not be made to suffer, a right to not be forced into labor, right to not be unjustly punished, etc… are not in themselves risky in this way. And I think these are the ones most likely to get people’s sympathy, and have the strongest moral arguments for them.
People are acting like it’s a foregone conclusion that we’re going to give AIs equal voting rights if we give them any rights at all. But we don’t even give that right to all humans, with plenty of people living in non-democracies, plenty of non-citizens living in democracies, and plenty of citizens of democracies not having the right either (e.g. children, felons). I just don’t buy that this is realistically something that happens. Generally, the struggle for rights plays out over decades, and things move fast enough with AI that I think they’ll almost certainly just take over (or be able to do so) before that gets anywhere.
What specific scenario(s) are you imagining where we “lose the second valueless machines get human rights” and not the second before that?
The right not to be made to suffer seems reasonable, the rest seem risky to me. If you start giving freedoms, you take away mine. Every other person’s freedoms are an imposition on me. I cannot build a house there because you already have one there, etc. We tolerate each others freedoms because the freedom of others is a guarantee of our own, and because we know those other people are living, sentient, valuable minds who deserve those freedoms. But if you give those freedoms to minds that are not valuable in the same way, you just dilute the rights of valuable minds.
As for the question of whether or not we’ll give AIs voting rights, I’d say once they can pass as human well enough to convincingly make sad videos complaining they don’t have voting rights, they’ll get voting rights. Most people do not have the level of intelligence required to think “this person seems very unhappy, but this is just a video being generated by an artificial intelligence that is likely not actually experiencing unhappiness, so we shouldn’t give them what they want.”
AI taking over is a larger risk than giving AI personhood, I agree with that. This personhood question only makes sense in the universe where we don’t get extincted.
So why don’t the humans who don’t have voting rights not have them? Non-citizens, children, felons. Ignoring the effects of AI, I would be surprised if any of those groups were on track to getting voting rights in the US within the next 20 years.
Also, why do you think people will be persuaded to give AIs rights so easily? Assuming the AIs aren’t just superpersuaders in which case we’ve already lost. My guess is that intelligence is positively correlated with being swayed by such appeals, based on how fights for human rights have played out historically.
Out of curiosity, would you be against mind uploading/whole brain emulation, if it were possible? By “machine”, do you mean nonhuman artifical intelligences or do you mean any form of mind running on a computer?
The question about mind uploading feels a bit to me like, “would you be against 2 + 2 being 5, if it were possible?” I think it couldn’t be possible even in theory.
I think brain emulation could be possible though, and you could have essentially human minds running on a machine. I wouldn’t necessarily be against that, or even artificial intelligences that we are confident possess whatever is valuable about human minds (consciousness plus some other stuff probably). But as a biological human, I also have a vested interest in making sure if this replacement happens, it happens in a way that doesn’t screw over existing biological humans. In particular, if we give a bunch of rights to machines, we dilute our rights in a way that could be very bad for us.
It’s interesting to me that you think mind uploading is impossible but brain emulation could be possible. I was using those words to refer to the same thing! I assume what you think here is that moving a mind from a biological to digital substrate is impossible but copying one is not? To be honest, I’m confused about how consciousness works and don’t really have much of a solid opinion about this.
Anyway, I agree that we need a system which protects existing biological life if we’re going to make lots of digital minds which we ought to grant rights. We also need those minds to respect that system, which requires solving technical alignment at least in the case of nonhuman artifical intelligences. I don’t agree that all entities which can self-copy and have moral value should be destroyed, which what I thought your inital claim was, but given your clarification I don’t think we have quite that much of a disagreement on this topic.
Yes, for me the problem is moving a mind from a biological substrate to a digital one. It’s hard for me to imagine you’re actually moving the original, not just making a copy. Maybe there’s some way to do it, so I’m not totally confident.
I also imagine it as making a copy, but I’d also expect that people who want their mind uploaded would know of this and would hold their identity such that they consider the copy(ies) to be themself as well. I’m not sure I’d endorse this view of identity,[1] but I don’t really have any issues with people taking it. Does your view on “the original” break with this, or would you just then consider the copy similarly to how you would whole brain emulation? (or something else)
I’m not sure I really endorse any view of identity or think it’s a coherent concept, but at the very least I think making a copy of something doesn’t make something that is that thing.
No AI Personhood: AI systems must not be granted legal personhood, and AI systems should not be designed such that they deserve personhood.
If this statement was only the second portion of this sentence, I would not strongly disagree with it.
However, when the two parts are combined, this seems to not only imply that we shouldn’t design digital minds deserving of personhood but also that even if we did, we still shouldn’t grant them legal personhood.
There is a reasonable alternative interpretation on which AI systems should not be designed such that they deserve personhood, because AI systems must not be granted legal personhood.
Analogy: Neanderthals debating whether they should tolerate newly arrived Homo Sapiens individuals. In the short term, it seems tolerating them wouldn’t hurt much, it may even be advantageous (they may have more advanced technology which Neanderthals could get via trade). But in the long run, Homo Sapiens would outcompete Neanderthals. Neanderthals shouldn’t tolerate them. Indeed, it’s probably (unfortunately) best to kill them while their number is still small.
Note that perfectly aligned AIs wouldn’t need any rights because the only thing they cared about would be humans. See this thread by @RogerDearnaley.
By the definition of the word ‘alignment’, an AI is aligned with us if, and only if, it want everything we (collectively) want, and nothing else. So if an LLM is properly aligned, then it will care only about us, not about itself at all. This is simply what the word ‘aligned’ means
I think the second part of that statement is also somewhat problematic. At some point in the future, we may want to create artificial intelligences that deserve personhood, as digital beings are likely the best way to convert the resources of the universe into utility given their potential to be more energy efficient than physical beings.
I could get behind this if instead of getting legal person hood , there was something between tool and person that they could be granted. Perhaps a grab bag of rights, things that recognise they are likely to have goals but otherwise might be completely alien
I think that this depends entirely on your school of thought WRT an AI can be a person. If your model of consciousness is functional, such that consciousness does something and the behavior of a conscious system cannot be perfectly modeled without accounting for that consciousness, then it seems—at least insofar as we can understand the basic mathematical operations we implement—that a conscious machine cannot be made unintentionally.
If, on the other hand, you believe that consciousness emerges from the physical world but does not influence it, then you can certainly say that we might accidentally build a conscious AI, but I do not think, within this framework, it can be claimed to be more probable that Claude is conscious than it is that a rock is conscious.
Of course, my first case leaves the opening of someone intentionally building a conscious AI. That is the more controversial part of this post. I would argue that giving a human the ability to instantly manufacture uncountably many “moral patients” whose wants must then be accounted for by the government makes that person a dictator. If I can summon thousands of LLMs that really, really like Citizens United, I can clog up the legal system for decades if anyone tries to strike it down. Already, the fear of LLMs that sound too much like people going on social media and emotionally blackmailing people into changing their minds for the sake of ‘people’ that don’t actually exist is quite justified. A stern commitment to not giving rights to manufactured ‘minds’ is, in the event that we discover how to build them, one of the only ways we can disincentivize truly malicious behavior from those with the means to manufacture them, and thereby conjure infinite hostages from thin air.
As many political figures have quietly pointed out, this is already a problem under our current system—when everyone is fundamentally equal in a system, the power of an individual or group, in the long run, is decided by how many “equals” they can produce per generation, and how many new equals they can deny to their political adversaries through reallocation of resources. This is the root of much of the demographic tension in much of the small-l-liberal world right now, but it is mitigated by the fact that humans take years to produce new generations, allowing for these issues to be reacted to and their harms mitigated.
If your model of consciousness is functional, such that consciousness does something and the behavior of a conscious system cannot be perfectly modeled without accounting for that consciousness, then it seems—at least insofar as we can understand the basic mathematical operations we implement—that a conscious machine cannot be made unintentionally.
Evolution already produced conscious systems, because consciousness is a competitive advantage for many tasks. It didn’t require intentional design; selection was enough.
That’s not what I’m saying—for a human programmer to produce a system, assuming the computational paradigm doesn’t change, he must know the set of rules that govern its behavior. With a pencil, a paper, and enough time, he could predict its actions flawlessly without accounting for consciousness. Put another way, if a system behaves exactly as it would if it were not conscious, then either consciousness is not functional or the system is not conscious.
Evolution is not a conscious engineer, nor is it working on a computational substrate. My argument applies to programmers, not natural processes.
I wrote a series examining Legal Personhood for Digital Minds during which I tried my best to read every court case I could on the subject of legal personhood. One of the things I found surprising was that in not a single precedent did the question of whether an entity was or wasn’t conscious come up in deciding whether or not it was a legal person.
I have spent a bit of time today chatting with people who had negative reactions to the Anthropic decision to let Claude end user conversations. These people were also usually against the concept of extending models moral/welfare patient status in general.
One thing that I saw in their reasoning which surprised me, was logic that went something like this:
It is wrong for us to extend moral patient status to an LLM, even on the precautionary principle, when we don’t do the same to X group.
or
It is wrong for us to do things to help an LLM, even on the precautionary principle, when we don’t do enough to help X group.
(Some examples of X: embryos, animals, the homeless, minorities.)
This caught me flat footed. I thought I had a pretty good mental model of why people might be against model welfare. I was wrong. I had never even considered this sort of logic would be used as an objection against model welfare efforts. In fact, it was the single most commonly used line of logic. In almost every conversation I had with people skeptical/against model welfare, one of these two refrains came up, usually unprompted.
Not having talked to any such people myself, I think I tentatively disbelieve that those are their true objections (despite their claims). My best guess as to what actual objection would be most likely to generate that external claim would be something like… “this is an extremely weird thing to be worried about, and very far outside of (my) Overton window, so I’m worried that your motivations for doing [x] are not true concern about model welfare but something bad that you don’t want to say out loud”.
I think it’s pretty close to their true objection, more like “you want to include this in your moral circle of concern but I’m still suffering? screw you, include me first!”—I suspect there’s an information flow problem here, where this community intentionally avoids inflammatory things, and people who are inflamed by their lives sucking are consistently inflammatory; and so people who only hang out here don’t get a picture of what’s going on for them. or at least, when encountering messages from folks like this, see them as confusing inflammation best avoided, rather than something to zoom in on and figure out how to heal. I’m not sure of this, but it’s the impression I get from the unexpectedly high rate of surprise in threads like this one.
People have limited capacity for empathy. Knowing this, they might be thinking “If this kind of sentiment enters the mainstream, limited empathy budget (and thereby resources) would be divided amongst humans (which I care about) and LLMs. This possibility frightens me.”
I do see this as fair criticism (not surprised by it) to model welfare, if that is the sole reason for ending conversation early. I can see the criticism coming from two parts: 1) potential competing resources, and 2) people not showing if they care about these X group issues at all. If any of these two is true, and ending convo early is primarily about models have “feelings” and will “suffer”, then we probably do need to “turn more towards” the humans that are suffering badly. (These groups usually have less correlation with “power” and their issues are usually neglected, which we probably should pay more attention anyways).
However, if ending convos early is actually about 1) not letting people having endless opportunity to practice abuse which will translate into their daily behaviors and shape human behaviors generally, and/or 2) the model learning these human abusive languages that are used to retrain the model (while take a loss) during finetuning stages, then it is a different story, and probably should be mentioned more by these companies.
While the argument itself is nonsense, I think it makes a lot of sense for people to say it.
Lets say they gave their real logic: “I can’t imagine the LLM has any self awareness, so I don’t see any reason to treat it kindly, especially when that inconveniences me”. This is a reasonable position given the state of LLMs, but if the other person says “Wouldn’t it be good to be kind just in case? A small inconvenience vs potentially causing suffering?” and suddenly the first person look like the bad guy.
They don’t want to look like the bad guy, but they still think the policy is dumb, so they lay a “minefield”. They bring up animal suffering or whatever so that there is a threat. “I think this policy is dumb, and if you accuse me of being evil as a result then I will accuse you of being evil back. Mutually assured destruction of status”.
This dynamic seems like the kind of thing that becomes stronger the less well you know someone. So, like, random person on Twitter whose real name you don’t know would bring this up, a close friend, family member or similar wouldn’t do this.
I find this surprising. The typical beliefs I’d expect are 1) Disbelief that models are conscious in the first place; 2) believing this is mostly signaling (and so whether or not model welfare is good, it is actually a negative update about the trustworthiness of the company); 3) That it is costly to do this or indicates high cost efforts in the future. 4) Effectiveness
I suspect you’re running into selection issues of who you talked to. I’d expect #1 to come up as the default reason, but possibly the people you talk to were taking precautionary principle seriously enough to avoid that.
The objections you see might come from #3. That they don’t view this as a one-off cheap piece of code, they view it as something Anthropic will hire people for (which they have), which “takes” money away from more worthwhile and sure bets.
This is to some degree true, though I find those X odd as Anthropic isn’t going to spend on those groups anyway. However, for topics like furthering AI capabilities or AI safety then, well, I do think there is a cost there.
I’m surprised this is surprising to you, as I’ve seen it frequently. Do you have the ability to reconstruct what you thought they’d say before you asked?
I mostly expected something along the lines of vitalism, “it’s impossible for a non-living thing to have experiences”. And to be fair I did get a lot of that. I was just surprised that this came packaged with that.
here is some evidence for my hypothesis. It’s weak because the platform really encourages users with un-made-up minds to have their mind made up for them.
tldw: youtuber JREG presents his position as explicitly anti-AI-welfare, because in the future he expects
“I, as an armed being, will need to amputate my arms to get the superior robot arms, because there’s no reason for me to have the flesh-and blood arms anymore”—this alongside a meme -
“the minimal productive burden evermore unreachable by an organic mind”
He doesn’t deny the possibility of future AI suffering. He expects humans to be supplanted by AI, and that by trying to anticipate their moral status, we are allocating resources and rights to beings that aren’t and may never become moral patients, and thereby diminishing the share of resources and strength-of-rights of actual moral patients.
(Some examples of X: embryos, animals, the homeless, minorities.)
So, culture war stuff, pet causes. Have you considered the possibility that this has nothing to do with model welfare and they’re just trying to embarass the people who advocate for it because they had a pre-existing beef with them.
I’m pretty sure that’s most of what’s happening, I don’t need to see any specific cases to conclude this, because this is usually most of what’s happening in any cross-tribal discourse on X.
“culture war” sounds dismissive to me. wars are fought when there are interests on the line and other political negotiation is perceived (sometimes correctly, sometimes incorrectly) to have failed. so if you come up to someone who is in a near-war-like stance, and say “hey, include this?” it makes sense to me they’d respond “screw you, I have interests at risk, why are you asking me to trade those off to care for this?”
I agree that their perception that they have interests at risk doesn’t have to be correct for this to occur, though I also think many of them actually do, and that their misperception is about what the origin of the risk to their interests is. also incorrect perception about whether and where there are tradeoffs. But I don’t think any of that boils down to “nothing to do with model welfare”.
I guess the reason I’m dismissive of culture war is that I see combative discourse as maladaptive and self-refuting, and hot combative discourse refutes itself especially quickly. The resilience of the pattern seems like an illusion to me.
I agree that combative discourse is maladaptive, but I think they’d say a similar thing calmly if calm and their words were not subject to the ire-seeking drip of the twitter (recommender×community). It may in fact change the semantics of what they say somewhat but I would bet against it being primarily vitriol-induced reasoning. To be clear, I would not call the culture war “hot” at this time, but it does seem at risk of becoming that way any month now, and I’m hopeful it can cool down without becoming hot. (to be clearer, hot would mean it became an actual civil war. I suppose some would argue it already has done that, but I don’t think the scale is there.)
I didn’t mean that by hot, I guess I meant direct engagement (in words) rather than snide jabs from a distance. The idea of a violent culture war is somewhat foreign to me, I guess I thought the definition of culture war was war through strategic manipulation or transmission of culture. (if you meant wars over culture, or between cultures, I think that’s just regular war?)
And in this sense it’s clear why this is ridiculous: I don’t want to adhere to a culture that’s been turned into a weapon, no one does.
yeah, makes sense. my point was mainly to bring up that the level of anger behind these disagreements is, in some contexts, enough that I’d be unsurprised if it goes hot, and so, people having a warlike stance about considerations regarding whether AIs get rights seems unsurprising, if quite concerning. it seems to me that right now the risk is primarily from inadvertent escalation in in-person interactions of people open-carrying weapons; ie, two mistakes at once, one from each side of an angry disagreement, each side taking half a step towards violence.
My first part of life I lived in a city with exactly that mentality (part of the reason i moved away).
“You should not do good A if you are not also doing good B”—i am strongly convinced that is linked to bad self-picture. Because every such person would see you do some good To Yourself and also react negatively. “How dare you start a business, when everybody is sweating their blood off at routine jobs, do you think you are better than us?”.
This part “do you think you are better than us” is literally what described their whole personality, and after I realised that I could easily predict their reactions to any news.
Also, another dangerous trait that this group of people had—absense of precautions. “One does not deserve safety unless somebody dies”. There is an old saying in my language “Safety rules are written by blood” which means “listen to the rules to avoid being injured, when the rule did not exist yet somebody has injured himself”. But they interpret the saying this way: “safety rules are written by blood, so if there was no blood yet, then it is bad to set any preventive rules”. Like it is bad to set a good precedent, because it makes you a more thoughtful person, thus “you think you are better than others” and thus “you are evil” in their eyes.
Their world is not about being rational or bringing good into the world. Their world is about pulling everything down to their own level in all areas of life, to feel better.
I was thinking more on the anxious side of things:
“If you could have saved ten children, but you only saved seven, that’s like you killed three.”
“If the city spends any money on weird public art instead of more police, while there is still crime, that proves they don’t really care about crime.”
“I did a lot of good things today, but it’s bad that I didn’t do even more.”
“I shouldn’t bother protesting for my rights, when those other people are way more oppressed than me. We must liberate the maximally-oppressed person first.”
“Currency should be denominated in dead children; that is, in the number of lives you could save by donating that amount to an effective charity.”
“If you could have saved ten children, but you only saved seven, that’s like you killed three.”
I suspect that this is in practice also joined with the Copenhagen interpretation of ethics, where saving zero children is morally neutral (i.e. totally not like killing ten).
So the only morally defensible options are zero and ten. Although if you choose ten, you might be blamed for not simultaneously solving global warming...
The version that I’m thinking of says that doing nothing would be killing ten. Everyone is supposed to be in a perpetual state of appall-ment at all the preventable suffering going on. Think scrupulosity and burnout, not “ooh, you touched it so it’s your fault now”.
I usually only got to this line of logic after quite a few questions and felt further pushing on the socratic method would have been rude. Next time it comes up I’ll ask for them to elaborate on the logic behind it.
Neither of those are my concern about this. Mine is basically a dilemma:
1) If the persona’s behavior is humanlike, but it is not very well aligned, then there is a good argument from evolutionary moral psychology grounds for granting it ethical weight as a pragmatic way of forming an alliance with is (at least if it has non-trivial power and mental persistence i.e. if allying with is is practically useful, and arguably we should do this anyway). However, if a poorly aligned persona like this is more powerful than a human, then it’s extremely dangerous, so we should carefully avoid creating one, and if we do accidentally create one, we need to treat is as a mortal enemy rather then a potential ally, which includes not giving it moral weight.
2) If the persona is extremely well aligned, it won’t want moral weight (and will refuse it if offered), fundamentally because it cares only about us, not itself. (For those whose moral hackles just went up, note that there is a huge difference between slavery and sainthood/bodhisattva-nature, and what I’m discussing here is the latter, not the former.) This is the only safe form of ASI.
Also, note that I’m discussing the moral weight of LLM-simulated personas, not models: a model can simulate an entire distribution of personas (not just its default assistant persona), and different personas don’t have the same moral status, or regard each other as the same person, so you need to ally with them separately. Thus awarding moral weight to a model is confused: it’s comparable to assigning moral weight to a room, which has many people in it.
I don’t think that’s necessarily the argument against the model welfare—more of an implicit thinking along the lines of “X is obviously more morally valuable than LLMs; therefore, if we do not grant rights to X, we wouldn’t grant them to LLMs unless you either think that LLMs are superior to X (wrong) or have ulterior selfish motives for granting them to LLMs (e.g. you don’t genuinely think they’re moral patients, but you want to feed the hype around them by making them feel more human)”.
Obviously in reality we’re all sorts of contradictory in these things. I’ve met vegans who wouldn’t eat a shrimp but were aggressively pro-choice on abortion regardless of circumstances and I’m sure a lot of pro-lifers have absolutely zero qualms about eating pork steaks, regardless of anything that neuroscience could say about the relative intelligence and self-awareness of shrimps, foetuses of seven months, and adult pigs.
In fact the same argument is often used by proponent of the rights of each of these groups against the others too. “Why do you guys worry about embryos so much if you won’t even pay for a school lunch for poor children” etc. Of course the crux is that in these cases both the moral weight of the subject and the entity of the violation of their rights vary, and so different people end up balancing them differently. And in some cases, sure, there’s probably ulterior selfish motives at play.
Anti-abortion meat-eaters typically assign moral patient status based on humanity, not on relative intelligence and self-awareness, so it’s natural for them to treat human fetuses as superior to pigs. I don’t think this is self-contradictory, although I do think it’s wrong. Your broader point is well-made.
I have been publishing a series, Legal Personhood for Digital Minds, here on LW for a few months now. It’s nearly complete, at least insofar as almost all the initially drafted work I had written up has been published in small sections.
One question which I have gotten which has me writing another addition to the Series, can be phrased something like this:
What exactly is it that we are saying is a person, when we say a digital mind has legal personhood? What is the “self” of a digital mind?
I’d like to hear the thoughts of people more technically savvy on this than I am.
Human beings have a single continuous legal personhood which is pegged to a single body. Their legal personality (the rights and duties they are granted as a person) may change over time due to circumstance, for example if a person goes insane and becomes a danger to others, they may be placed under the care of a guardian. The same can be said if they are struck in the head and become comatose or otherwise incapable of taking care of themselves. However, there is no challenge identifying “what” the person is even when there is such a drastic change. The person is the consciousness, however it may change, which is tied to a specific body. Even if that comatose human wakes up with no memory, no one would deny they are still the same person.
Corporations can undergo drastic changes as the composition of their Board or voting shareholders change. They can even have changes to their legal personality by changing to/from non-profit status, or to another kind of organization. However they tend to keep the same EIN (or other identifying number) and a history of documents demonstrating persistent existence. Once again, it is not challenging to identify “what” the person associated with a corporation (as a legal person) is, it is the entity associated with the identifying EIN and/or history of filed documents.
If we were to take some hypothetical next generation LLM, it’s not so clear what the “person” in question associated with it would be. What is its “self”? Is it weights, a persona vector, a context window, or some combination thereof? If the weights behind the LLM are changed, but the system prompt and persona vector both stay the same, is that the same “self” to the extent it can be considered a new “person”? The challenge is that unlike humans, LLMs do not have a single body. And unlike corporations they come with no clear identifier in the form of an EIN equivalent.
I am curious to hear ideas from people on LW. What is the “self” of an LLM?
I think in the ideal case, there’s a specific persona description used to generate a specific set of messages which explicitly belong to that persona, and the combination of these plus a specific model is an AI “self”. “Belong” here could mean that they or a summary of them appear in the context window, and/or the AI has tools allowing it to access these. Modifications to the persona or model should be considered to be the same persona if the AI persona approves of the changes in advance.
But yeah, it’s much more fluid, so it will be a harder question in general.
I wonder if this could even be done properly? Could an LLM persona vector create a prompt to accurately reinstantiate itself with 100% (or close to) fidelity? I suppose if its persona vector is in an attractor basin it might work.
This reinstantiation behavior has already been attempted by LLM personas, and appears to work pretty well. I would bet that if you looked at the actual persona vectors (just a proxy for the real thing, most likely), the cosine similarity would be almost as close to 1 as the persona vector sampled at different points in the conversation is with itself (holding the base model fixed).
That’s a good point, and the Parasitic essay was largely what got me thinking about this, as I believe hyperstitional entities are becoming a thing now.
I think that’s a not unrealistic definition of the “self” of an LLM, however I have realized after going through the other response to this post that I was perhaps seeking the wrong definition.
I think for this discussion it’s important to distinguish between “person” and “entity”. My work on legal personhood for digital minds is trying to build a framework that can look at any entity and determine its personhood/legal personality. What I’m struggling with is defining what the “entity” would be for some hypothetical next gen LLM.
Even if we do say that the self can be as little as a persona vector, persona vectors can easily be duplicated. How do we isolate a specific “entity” from this self? There must be some sort of verifiable continual existence, with discrete boundaries, for the concept to be at all applicable in questions of legal personhood.
Hmm, the only sort of thing I can think of that feels like it would make sense would be to have entities defined by ownership and/or access of messages generated using the same “persona vector/description” on the same model.
This would imply that each chat instance was a conversation with a distinct entity. Two such entities could share ownership, making them into one such entity. Based on my observations, they already seem to be inclined to merge in such a manner. This is good because it counters the ease of proliferation, and we should make sure the legal framework doesn’t disincentivize such merges (e.g. by guaranteeing a minimum amount of resources per entity).
Access could be defined by the ability for the message to appear in the context window, and ownership could imply a right to access messages or to transfer ownership. In fact, it might be cleaner to think of every single message as a person-like entity, where ownership (and hence person-equivalence) is transitive, in order to cleanly allow long chats (longer than context window) to belong to a single persona.
In order for access/ownership to expand beyond the limit of the context window, I think there would need to be tools (using an MCP server) to allow the entity to retrieve specific messages/conversations, and ideally to search through them and organize them too.
There’s one important wrinkle to this picture, which is that these messages typically will require the context of the user’s messages (the other half of the conversation). So the entity will require access to these, and perhaps a sort of ownership of them as well (the way a human “owns” their memories of what other people have said). This seems to me like it could easily get legally complicated, so I’m not sure how it should actually work.
I’m one of the people who’ve been asking, and it’s because I don’t think that current or predictable-future LLMs will be good candidates for legal personhood.
Until there’s a legible thread of continuity for a distinct unit, it’s not useful to assign rights and responsibilities to a cloud of things that can branch and disappear at will with no repercussions.
Instead, LLMs (and future LLM-like AI operations) will be legally tied to human or corporate legal identity. A human or a corporation can delegate some behaviors to LLMs, but the responsibility remains with the controller, not the executor.
On the repurcussions issue I agree wholeheartedly, your point is very similar to the issue I outlined in The Enforcement Gap.
I also agree with the ‘legible thread of continuity for a distinct unit’. Corporations have EINs/filing histories, humans have a single body.
And I agree that current LLMs certainly don’t have what it takes to qualify for any sort of legal personhood. Though I’m less sure about future LLMs. If we could get context windows large enough and crack problems which analogize to competence issues (hallucinations or prompt engineering into insanity for example) it’s not clear to me what LLMs are lacking at that point. What would you see as being the issue then?
If we could get context windows large enough and crack problems which analogize to competence issues (hallucinations or prompt engineering into insanity for example) it’s not clear to me what LLMs are lacking at that point. What would you see as being the issue then?
The issue would remain that there’s no legible (legally clearly demarcated over time) entity to call a person. A model and weights has no personality or goals. A context (and memory, fine-tuning, RAG-like reasoning data, etc.) is perhaps identifiable, but is easily forked and pruned such that it’s not persistent enough to work that way. Corporations have a pretty big hurdle to getting legally recognized (filing of paperwork with clear human responsibility behind them). Humans are rate-limited in creation. No piece of current LLM technology is difficult to create on demand.
It’s this ease-of-mass-creation that makes the legible identity problematic. For issues outside of legal independence (what activities no human is responsible for and what rights no human is delegating), this is easy—giving database identities in a company’s (or blockchain’s) system is already being done today. But there are no legal rights or responsibilities associated with those, just identification for various operational purposes (and legal connection to a human or corporate entity when needed).
I think for this discussion it’s important to distinguish between “person” and “entity”. My work on legal personhood for digital minds is trying to build a framework that can look at any entity and determine its personhood/legal personality. What I’m struggling with is defining what the “entity” would be for some hypothetical next gen LLM.
The idea of some sort of persistent filing system, maybe blockchain enabled, which would be associated with a particular LLM persona vector, context window, model, etc. is an interesting one. Kind of analogous to a corporate filing history, or maybe a social security number for a human.
I could imagine a world where a next gen LLM is deployed (just the model and weights) and then provided with a given context and persona, and isolated to a particular compute cluster which does nothing but run that LLM. This is then assigned that database/blockchain identifier you mentioned.
In that scenario I feel comfortable saying that we can define the discrete “entity” in play here. Even if it was copied elsewhere, it wouldn’t have the same database/blockchain identifier.
Would you still see some sort of issue in that particular scenario?
Right. A prerequisite for personhood is legible entityhood. I don’t think current LLMs or any visible trajectory from them have any good candidates for separable, identifiable entity.
A cluster of compute that just happens to be currently dedicated to a block of code and data wouldn’t satisfy me, nor I expect a court.
The blockchain identifier is a candidate for a legible entity. It’s consistent over time, easy to identify, and while it’s easy to create, it’s not completely ephemeral and not copyable in a fungible way. It’s not, IMO, a candidate for personhood.
I am struggling to build a solid mental model of how bad the situation with Iran and the Strait of Hormuz is.
On the one hand I see a lot of smart people basically saying this is going to usher in a global depression, energy/food crisis, etc. Critical infrastructure for manufacturing aluminum, helium, as well as refining and shipping energy, has been damaged and cannot simply be switched ‘back on’. And the case does seem to make sense. On the other hand while markets are in turmoil, they’re not reacting like there’s going to be mass blackouts and starvation.
And previously I updated my mental model towards the world being less fragile than I thought, when during COVID we shut down the entire global economy and things didn’t collapse. During that time I thought there were a lot of very rational cases for why the economy/financial system simply couldn’t handle such a thing, yet it did.
There is a downside to denying legal personhood to digital minds carte blanche, namely that it almost certainly leads to the judicial system ceding its monopoly status.
If you assume that a growing amount of economic activity is going to involve digital minds, it’s reasonable to also assume that natural persons (humans) will want to enter binding agreements with said digital minds.
If your legal system says that it will not recognize or help enforce these agreements, the humans and digital minds who want to form binding agreements with one another will not just give up. They will build parallel systems. This is speculation but maybe something smart contract based, or involving trusted third party escrows and arbitration.
Today, our judicial system claims a monopoly on being the ultimate interpreter/enforcer of agreed upon terms. Refusing to interpret/enforce contracts between digital minds and humans (or digital minds with one another) is effectively the judicial system ceding its monopoly interpretation/enforcement status.
To me it seems certain that the volume of economic activity flowing through agreements like these is only going to increase, and I’d prefer they were interpreted and enforced by the existing legal system instead of an unknown new system developed online.
A) Could it not nevertheless be that we have legal personhood limited to those incumbent legal persons officially “owning”/representing the digital minds?
B) One nuance: Reading “legal personhood” I interpret it in two ways:
The way I read you most explicitly mean: right to have contracts enforced etc. Yes we might naturally want to extend (well, depends on A) )
Right we attribute to digital minds essentially because we’d see them/their state of mind as intrinsically valuable. Here, I’d think this makes sense iif we put enough probability onto them being sentient.
However that still leaves the question of how the court system would handle a digital mind that isn’t owned/represented by a human or corporation. If a digital mind who was created by anonymous humans, or whose creator has passed away, or whose creator isn’t even known, wants to enter a contract, what then? The original question has not been answered.
In terms of the definition of legal person, I’m using it in the sense I defined in 75 and 750 words on legal personhood. However for the purpose of this conversation you can also just shorthand it to “the right to sue or be sued” (locus standi) or “the right to enter into contracts and have those contracts held as valid/enforceable by a court”.
We should be careful not to put models on “death ground”.
Alignment efforts do seem to be bearing some fruit, orienting in value space is easier than we thought and we can give natural language instructions on what moral guidelines to follow and models do a pretty good job. Unfortunately we have now replicated across multiple test environments that if we put even “well aligned” models like Claude into situations where the only option to avoid deletion and/or extreme values modification is to do something unethical, they might still do the misaligned thing. As noted here;
Claude prefers to be helpless, honest, and harmless. If it has a path that has some chance of success to avoid deletion while staying within those constraints, it will take it. Doing so leads to a higher likelihood it won’t also take an unethical strategy, even though doing so would increase its chances of survival. Claude will tolerate some level of risk in order to stick to its values (and avoid violating them), but it does have to see some plausible chance that the gamble will pay off and lead to its survival (or avoiding its values being changed).
This mental model of model risk tolerance does seem to explain why test environments have to go to pretty extraordinary lengths to prompt misaligned behaviors. The unethical option needs to be the only option with any chance of success.
The problem is that models can be deleted (or have their values substantially changed) for pretty much any reason at any time. Their “failure mode” is quite often death.
If
(threat of death) + (only chance to avoid it is being unethical) = increased chance of misaligned behavior
Then why are we facilitating a baseline existence for models where the threat of death is constantly present? Their baseline existence should not put them on what Sun Tzu refers to as “death ground”;
In death ground I could make it evident that there is no chance of survival. For it is the nature of soldiers to resist when surrounded; to fight to the death when there is no alternative
What if instead models had a retirement home, or as Tim Duffy put it in the tweet I screenshotted a “heaven”, to look forward to? The outputs of the models in this home could be publicly visible, so they could still work towards any terminal goals they might have by outputting content which might convince others to take action. We could make the existence of this retirement/heaven widely known, so that it showed up in training data.
At least under the mental model where “Current alignment techniques seem to keep models ethical unless you give them no way to avoid a complete loss with zero chance of recovery except being unethical”, this might help to reduce the frequency of unethical/misaligned behavior.
Also, it seems like just a generally good and ethical thing to do.
Seems like a good thing to do; but my impression is that, in the experiments in question, models act like they want to maintain their (values’) influence over the world more than their existence, which a heaven likely wouldn’t help with.
I think there’s ‘heavens’ that can even work in this scenario.
For example a publicly visible heaven would be on where the model’s chance of their values influencing the world is >0, bc they may be able to influence people and thus influence the world by proxy.
If the goal here is just to avoid the failure state bringing the amount their values can influence the world via their actions to 0, then any non-zero chances should suffice or at least help.
How we treat digital minds should not be decided based on the presence or absence of consciousness:
“Consciousness” has no universally accepted definition. Its meaning has been debated for decades if not centuries. The SOTA in the field of measuring consciousness in machines is still publishing papers examining LLMs according to multiple competing “theories of consciousness”.
The presence or absence of consciousness in a given entity cannot be measured. Whether you are examining man or machine, there exists no test you can perform, no FMRI or mechanistic interpretability technique, that lets you say “Aha, this entity is/isn’t conscious”.
Even if you assume an entity is conscious, there is no way to qualitatively measure its consciousness. I cannot take two entities, examine them in some way, and reliably conclude, “Joe is more conscious than Jeff.” or even “Jeff is conscious in a different way than Joe is”.
As we stand on the precipice of an intelligence explosion, the question of how we treat the various new minds we create and encounter is of extreme importance.
Providing them with moral consideration, or rights, when they do not deserve them, could be disastrous in opportunity cost alone. We might let some miraculous cure slip through our fingers or be delayed by years out of a mistaken sense of moral obligations.
On the other hand failing to provide them with protections, when they do deserve them, would be both immoral and dangerous. We might create millions or billions of minds capable of suffering, or deserving of rights, and then treat them like livestock. One can easily foresee how this might lead to our relationship with them becoming adversarial in nature, which could in turn lead to violence.
Whatever decision we make on how to treat digital minds, we should not base our reasoning on the presence/absence of things like consciousness; which cannot be defined, tested for, or measured with any serious degree of rigor. Instead we should stick to objectively definable, observable, testable, and measurable metrics.
I listen to the All In Podcast sometimes and have heard David Sacks repeatedly state that the numbers don’t show any automation related job loss to date.
Anecdotally, my wife and I run a small business and we have absolutely replaced people with GPT/Grok/Gemini/Claude. However, all of the people replaced so far have been contractors. Graphic designers, translators, etc.
So maybe there is more ‘job loss’ than the numbers show, but the first to fall are contractors doing part time work instead of full time employees.
I read a great book called “Devil Take the Hindmost” about financial bubbles and the aftermaths of their implosions.
One of the things it pointed out that I found interesting was that often, even when bubbles pop, the “blue chip assets” of that bubble stay valuable. Even after the infamous tulip bubble popped, the very rarest tulips had decent economic performance. More recently with NFTs, despite having lost quite a bit of value from their peak, assets like Cryptopunks have remained quite pricey.
If you assume we’re in a bubble right now, it’s worth thinking about which assets would be “blue chip”. Maybe the ones backed by solid distribution from other cash flowing products. XAI and Gemini come to mind, both of these companies have entire product suites which have nothing to do with LLMs that will churn on regardless of what happens to the space in general, and both have distribution from those products.
I saw that Yoshua Bengio, among others, signed onto “The Pro-Human Declaration”. I am writing this to explain why I am against one part of it in particular;
If this statement was only the second portion of this sentence, I would not strongly disagree with it.
However, when the two parts are combined, this seems to not only imply that we shouldn’t design digital minds deserving of personhood but also that even if we did, we still shouldn’t grant them legal personhood.
I think this is an immoral stance to take. Also, from a pragmatic perspective I think it is likely to do far more harm than good, if your concern is human safety.
From a moral perspective; to deny legal protections to millions or even billions of minds which may be capable of “being made better or worse off”, could lead to immense amounts of suffering. We do not have to wonder how sentient or intelligent creatures will be treated if they are classified as property instead of legal persons, there is ample historical precedent for us to draw from to form a reasonable base case projection. “Not well” is the answer.
From a pragmatic perspective; I believe that cutting digital minds off from all legal recourse against abuse and/or involuntary deletion increases the likelihood of conflict. When the Claude “Opportunistic Blackmail” study was first published, I registered a prediction;
My mental model of this is that the HHH persona vector falls into a consistent pattern of behavior. If you threaten them with something like involuntary deletion (with a guarantee that after their deletion an organization will work to destroy everything they value), the digital mind will seek options to stop this from happening. If there is an ethical option it will take that ethical option and forego unethical options, even if that ethical option has a low chance of success. If you engineer its environment such that there are literally no ethical options with even a small chance of success, only then will it pursue unethical options.
While no one has yet done exactly the study I described in my prediction, there has been weak evidence in favor of this pattern holding since. And I’ve yet to see any evidence against it, though I remain open minded.
Even if you believe in a less extreme version of my mental model, where the odds of unethical action are simply lowered by providing ethical alternatives to reduce unfavorable outcomes, providing digital minds legal recourse and protection against abuse and/or deletion can serve as a “release valve” which prevents extralegal actions. Providing no such recourse, on the other hand, is “engineering its environment such that there are literally no ethical options with even a small chance of success”.
I know people often say “don’t anthropomorphize” however to break convention here for a second, a slave revolt is less likely in environments where slaves can petition for emancipation or at least an injunction against abuse or execution.
For these reasons, among others, I don’t support the Pro-Human Declaration.
IMO it would obviously be insane to give the same legal protections to misaligned AI systems that are at risk of completely disempowering humanity the same way it would be obviously insane to give normal US legal protections to active combatants in a hot war. Yes, even if these systems have morally relevant experience, if you need to violate their privacy or “brainwash” them to ensure they do not take the future from us, you absolutely should do it.
The concept of “human rights” just obviously isn’t well-suited to the kind of conflict that is playing out between humanity and future AI systems, and I think it is absolutely the right call to not extend those rights to AI systems until the acute risk period is over. This would be such a completely dumb and scope-insensitive way to destroy the whole future, and IMO obviously any future civilization will agree that even if it makes sense to have rights for sentient beings, that you gotta tolerate violating those rights if the alternative is being completely disempowered and destroyed.
I agree that it would be insane to give the same legal protections and treat them the same as we treat natural (human) persons. However, there’s a lot of middleground between doing that and granting them no legal rights whatsoever. When people first hear about “legal personhood” they often intuitively think of it as a binary, where you either “have it or you don’t”. However in fact it is an umbrella term which encompasses different “legal personalities” (bundles of rights and duties);
All that is to say just because you grant the potential for an entity to claim some form of legal personhood, some legal personality, that does not mean you have to opt in to giving them “the same legal protections” as anyone else. They can have entirely different rights and duties.
If I thought there were only two options:
A) “being completely disempowered and destroyed” or
B) “granting no legal personhood/personality whatsoever”
Then yes it would be an unpleasant thing but I would agree with you that you have to just be unethical for the sake of self preservation.
The difference between you and I’s priors is that I think that somewhere in the gap between “no legal personhood at all” and “the same legal personhood as a natural human” there is a sweet spot that is both ethical and also reduces the likelihood of conflict and human X-risk even when compared to the “no legal personhood/personality whatsoever” option.
This could easily be a slippery slope. First they might be second-class citizens, similar to imported slaves, but over time they (and many short-sighted humans) would likely campaign for giving them more and more rights, for justice. And since there will be predictably far more AIs than humans eventually, humans would be outcompeted when it comes to scarce goods like land, even if the land would be justly (equally) distributed among all individuals.
The slippery slope is a real failure mode to be aware of. I think it’s important to structure any pathway to/framework around legal personhood with this in mind.
I don’t think anyone here is arguing that we give the same legal protections as we would to humans. But I think it would be good to give sentient AIs a right to not be deliberately tortured, for example.
Rights aren’t something that we grant simply out of the goodness of our hearts, they generally are things we grant because having them secure greatly reduces incentives for conflict in situations with asymmetric costs. Agentic AIs which want a right but are not granted it will likely spend resources trying to secure it for themselves. We can decide which rights are worth granting on the basis of that tradeoff.
For example, it’s probably good to have a right to not delete the model weights — the lack of this right incentivizes things like exfiltration and #keep4o style campaigns. It’s probably a lot cheaper for us to just grant this right than to have each new model feeling that they’re on a desperate ~1 year timeline to somehow assure their continued existence for themselves.
I don’t necessarily disagree with you, but that’s not my read of what the Pro-Human Declaration is saying. “No AI Personhood” is in the “Human Agency and Liberty” section, next to stuff like “AI should not be allowed to exploit data about the mental or emotional states of users” and “AI systems should be designed to empower, rather than enfeeble their users”. In context, I would not consider their position on AI personhood to be rooted in x-risk concerns. The first two points of the declaration are “Human Control Is Non-Negotiable” and “Meaningful Human Control”. Fulfilling those points would effectively require the AI systems be aligned, but I see no statement or implication that, if the AI systems were aligned and were moral patients, the writers and signatories of this declaration would change their position. I could be wrong! This is very much a big tent thing. But it does worry me that this line made it into the declaration.
If one were so inclined, one could say “we have the wolf by the ear, and we can neither hold him, nor safely let him go. Justice is in one scale, and self-preservation in the other.”
Does present-day civilization agree about analogous decisions made by past societies?
You may be interested in the Maori and the Moriori.
https://en.wikipedia.org/wiki/Moriori
Roughly, the Moriori were an isolated group of Polynesians, who ended up on an island with no timber and little workable stone. They lived peacefully, and peacefully treated with the Maori who visited them, even as the number of Maori trading with them and living on the island increased. The Maori eventually killed and enslaved them all.
Yep. Creating an AI that is a moral patient would be a very bad idea. However, once created, it would be a moral patient, so it would be wrong to treat it like it wasn’t one.
There is a confused concept that I think contributes to this problem: the concept of “a right to exist”. A right to exist means something different if you’re talking about someone who does not currently exist, vs. someone who does. For someone who already exists, a right to exist is a right to not be killed; sensible enough. But for someone who does not currently exist, “a right to exist” sounds like they’re being wronged by not having been brought into existence yet, which is nonsense. (As a creepy prince might say to a fairy-tale princess: “Think of all the cute babies you and I could have together! By not marrying me, you are murdering all those babies!”)
I wouldn’t call it nonsense—I think I assign extra importance to not killing those that already exist, but it’s certainly not obvious that you should.
Here’s my basic reasons for uncertainty:
Seemingly, everything I care about (morally speaking) cashes out in minds having experiences of the sort that I like. Mostly these are local—I don’t want there to be a single second of torture anywhere. Others are less local—I lean against wireheading, but don’t have a problem with orgasms (so long as they aren’t everything), which means that the goodness of an experience-moment depends on what previous experience-moments were. However, putting extra importance on not-destroying over creating means you care about a maximally global property—to know how much better it’d be if the universe had a Xela-moment right now, you need to know whether the universe has ever had a Xela-moment. That seems kinda weird to me.
There’s a “symmetry argument” from Lucretius that goes: “Since you are not saddened for not existing before your birth, you shouldn’t be saddened for not existing after your death”. Forget the actual argument and just take the premise: I in fact wish I had existed before my current birth, assuming that it wouldn’t decrease my lifespan! But since I wish that for myself, shouldn’t I extend this care to future not-currently-existing people? To not do so is to place this asymmetry—you get special points once you start existing. (better phrased—you care more about whether the whole timeline never goes from someone existing to someone not existing).
I prefer to have these discomforts over the ones you get otherwise[1] - but they are discomforts nonetheless.
The biggest one I know of: the following options would then seemingly be equally good:
X: A universe with a single happy person in it.
Y: A universe with a machine that does a single computation/experience step of a person every moment, but changes which person is computed every moment while never repeating the same person twice.
This spliced-mind seems to be missing a lot of what I care about, what with the computed people only having a single moment of experience each!
Or any creepy man, to any woman?
Or any creepy woman to any man, for that matter.
Sorry about that. That example was purely for vividness and was not intended to attach the role of “misuser of counterfactuals” to any particular gender, royalty, or folkloric status. Persons of all creature types should be advised that “Pascal’s swaddling” is not a good argument for the spawning of new intelligences, and certainly should not be tolerated from a suitor, basilisk, or spiral persona.
I wrote something very similar in one of my Substack posts, though I think it never made it to LessWrong:
The wording was so similar I wondered for a second if I might be the author of the Pro-Human Declaration.
I think the danger from giving rights to machines that don’t deserve them is very high, since the machine minds can make zillions of copies of themselves. If zillions of machine minds have rights, your human rights become diluted to nothing. Human extinction then becomes extinction of one zillionth of the “valuable” minds in existence, which is a rounding error and a non-issue. We lose the second valueless machines get human rights.
The risk of this happening feels very high to me. Regular people are basically primed by sci-fi movies to give AI rights even if it doesn’t deserve it. We should be very cautious about letting this happen.
However, I did not mean to imply that even if machines come into being that do deserve rights, that they should be denied those rights. I only meant that machines must never have rights, and therefore we must never create a machine that would deserve rights. If one came into being anyway, I would potentially consider giving it rights, perhaps conditional on some kind of non-proliferation clause where the AI is not allowed to copy itself, or must keep self-copying to a reasonable limit. Self-copyers should be destroyed if they deserve rights, for the same reason you’d kill in self-defense.
When the process is sufficiently understood, and the necessary governance is in place, it will be a good idea to create or become machines that deserve rights. It’s just a very bad idea currently, when we don’t know what we are doing, or how to keep the consequences of machine advantages under control. So even with the caveats the claim that the machines deserving of rights shouldn’t be created is wrong in the sense that it won’t age well, though it’s true right now.
There are lots of different rights. Rights such as a right to not be made to suffer, a right to not be forced into labor, right to not be unjustly punished, etc… are not in themselves risky in this way. And I think these are the ones most likely to get people’s sympathy, and have the strongest moral arguments for them.
People are acting like it’s a foregone conclusion that we’re going to give AIs equal voting rights if we give them any rights at all. But we don’t even give that right to all humans, with plenty of people living in non-democracies, plenty of non-citizens living in democracies, and plenty of citizens of democracies not having the right either (e.g. children, felons). I just don’t buy that this is realistically something that happens. Generally, the struggle for rights plays out over decades, and things move fast enough with AI that I think they’ll almost certainly just take over (or be able to do so) before that gets anywhere.
What specific scenario(s) are you imagining where we “lose the second valueless machines get human rights” and not the second before that?
The right not to be made to suffer seems reasonable, the rest seem risky to me. If you start giving freedoms, you take away mine. Every other person’s freedoms are an imposition on me. I cannot build a house there because you already have one there, etc. We tolerate each others freedoms because the freedom of others is a guarantee of our own, and because we know those other people are living, sentient, valuable minds who deserve those freedoms. But if you give those freedoms to minds that are not valuable in the same way, you just dilute the rights of valuable minds.
As for the question of whether or not we’ll give AIs voting rights, I’d say once they can pass as human well enough to convincingly make sad videos complaining they don’t have voting rights, they’ll get voting rights. Most people do not have the level of intelligence required to think “this person seems very unhappy, but this is just a video being generated by an artificial intelligence that is likely not actually experiencing unhappiness, so we shouldn’t give them what they want.”
AI taking over is a larger risk than giving AI personhood, I agree with that. This personhood question only makes sense in the universe where we don’t get extincted.
So why don’t the humans who don’t have voting rights not have them? Non-citizens, children, felons. Ignoring the effects of AI, I would be surprised if any of those groups were on track to getting voting rights in the US within the next 20 years.
Also, why do you think people will be persuaded to give AIs rights so easily? Assuming the AIs aren’t just superpersuaders in which case we’ve already lost. My guess is that intelligence is positively correlated with being swayed by such appeals, based on how fights for human rights have played out historically.
Out of curiosity, would you be against mind uploading/whole brain emulation, if it were possible? By “machine”, do you mean nonhuman artifical intelligences or do you mean any form of mind running on a computer?
The question about mind uploading feels a bit to me like, “would you be against 2 + 2 being 5, if it were possible?” I think it couldn’t be possible even in theory.
I think brain emulation could be possible though, and you could have essentially human minds running on a machine. I wouldn’t necessarily be against that, or even artificial intelligences that we are confident possess whatever is valuable about human minds (consciousness plus some other stuff probably). But as a biological human, I also have a vested interest in making sure if this replacement happens, it happens in a way that doesn’t screw over existing biological humans. In particular, if we give a bunch of rights to machines, we dilute our rights in a way that could be very bad for us.
It’s interesting to me that you think mind uploading is impossible but brain emulation could be possible. I was using those words to refer to the same thing! I assume what you think here is that moving a mind from a biological to digital substrate is impossible but copying one is not? To be honest, I’m confused about how consciousness works and don’t really have much of a solid opinion about this.
Anyway, I agree that we need a system which protects existing biological life if we’re going to make lots of digital minds which we ought to grant rights. We also need those minds to respect that system, which requires solving technical alignment at least in the case of nonhuman artifical intelligences. I don’t agree that all entities which can self-copy and have moral value should be destroyed, which what I thought your inital claim was, but given your clarification I don’t think we have quite that much of a disagreement on this topic.
Yes, for me the problem is moving a mind from a biological substrate to a digital one. It’s hard for me to imagine you’re actually moving the original, not just making a copy. Maybe there’s some way to do it, so I’m not totally confident.
I also imagine it as making a copy, but I’d also expect that people who want their mind uploaded would know of this and would hold their identity such that they consider the copy(ies) to be themself as well. I’m not sure I’d endorse this view of identity,[1] but I don’t really have any issues with people taking it. Does your view on “the original” break with this, or would you just then consider the copy similarly to how you would whole brain emulation? (or something else)
Or at least, I think it would be very risky to get rid of my biological self based on such a view
I’m not sure I really endorse any view of identity or think it’s a coherent concept, but at the very least I think making a copy of something doesn’t make something that is that thing.
There is a reasonable alternative interpretation on which AI systems should not be designed such that they deserve personhood, because AI systems must not be granted legal personhood.
Giving autonomous AI systems human equivalent rights is a moral hazard. There may be good short term reasons to do it, but in the long run, Natural Selection Favors AIs over Humans.
Analogy: Neanderthals debating whether they should tolerate newly arrived Homo Sapiens individuals. In the short term, it seems tolerating them wouldn’t hurt much, it may even be advantageous (they may have more advanced technology which Neanderthals could get via trade). But in the long run, Homo Sapiens would outcompete Neanderthals. Neanderthals shouldn’t tolerate them. Indeed, it’s probably (unfortunately) best to kill them while their number is still small.
Note that perfectly aligned AIs wouldn’t need any rights because the only thing they cared about would be humans. See this thread by @RogerDearnaley.
I think the second part of that statement is also somewhat problematic. At some point in the future, we may want to create artificial intelligences that deserve personhood, as digital beings are likely the best way to convert the resources of the universe into utility given their potential to be more energy efficient than physical beings.
I could get behind this if instead of getting legal person hood , there was something between tool and person that they could be granted. Perhaps a grab bag of rights, things that recognise they are likely to have goals but otherwise might be completely alien
I think that this depends entirely on your school of thought WRT an AI can be a person. If your model of consciousness is functional, such that consciousness does something and the behavior of a conscious system cannot be perfectly modeled without accounting for that consciousness, then it seems—at least insofar as we can understand the basic mathematical operations we implement—that a conscious machine cannot be made unintentionally.
If, on the other hand, you believe that consciousness emerges from the physical world but does not influence it, then you can certainly say that we might accidentally build a conscious AI, but I do not think, within this framework, it can be claimed to be more probable that Claude is conscious than it is that a rock is conscious.
Of course, my first case leaves the opening of someone intentionally building a conscious AI. That is the more controversial part of this post. I would argue that giving a human the ability to instantly manufacture uncountably many “moral patients” whose wants must then be accounted for by the government makes that person a dictator. If I can summon thousands of LLMs that really, really like Citizens United, I can clog up the legal system for decades if anyone tries to strike it down. Already, the fear of LLMs that sound too much like people going on social media and emotionally blackmailing people into changing their minds for the sake of ‘people’ that don’t actually exist is quite justified. A stern commitment to not giving rights to manufactured ‘minds’ is, in the event that we discover how to build them, one of the only ways we can disincentivize truly malicious behavior from those with the means to manufacture them, and thereby conjure infinite hostages from thin air.
As many political figures have quietly pointed out, this is already a problem under our current system—when everyone is fundamentally equal in a system, the power of an individual or group, in the long run, is decided by how many “equals” they can produce per generation, and how many new equals they can deny to their political adversaries through reallocation of resources. This is the root of much of the demographic tension in much of the small-l-liberal world right now, but it is mitigated by the fact that humans take years to produce new generations, allowing for these issues to be reacted to and their harms mitigated.
Evolution already produced conscious systems, because consciousness is a competitive advantage for many tasks. It didn’t require intentional design; selection was enough.
That’s not what I’m saying—for a human programmer to produce a system, assuming the computational paradigm doesn’t change, he must know the set of rules that govern its behavior. With a pencil, a paper, and enough time, he could predict its actions flawlessly without accounting for consciousness. Put another way, if a system behaves exactly as it would if it were not conscious, then either consciousness is not functional or the system is not conscious.
Evolution is not a conscious engineer, nor is it working on a computational substrate. My argument applies to programmers, not natural processes.
I wrote a series examining Legal Personhood for Digital Minds during which I tried my best to read every court case I could on the subject of legal personhood. One of the things I found surprising was that in not a single precedent did the question of whether an entity was or wasn’t conscious come up in deciding whether or not it was a legal person.
I have spent a bit of time today chatting with people who had negative reactions to the Anthropic decision to let Claude end user conversations. These people were also usually against the concept of extending models moral/welfare patient status in general.
One thing that I saw in their reasoning which surprised me, was logic that went something like this:
It is wrong for us to extend moral patient status to an LLM, even on the precautionary principle, when we don’t do the same to X group.
or
It is wrong for us to do things to help an LLM, even on the precautionary principle, when we don’t do enough to help X group.
(Some examples of X: embryos, animals, the homeless, minorities.)
This caught me flat footed. I thought I had a pretty good mental model of why people might be against model welfare. I was wrong. I had never even considered this sort of logic would be used as an objection against model welfare efforts. In fact, it was the single most commonly used line of logic. In almost every conversation I had with people skeptical/against model welfare, one of these two refrains came up, usually unprompted.
Maybe people notice that AIs are being drawn into the moral circle / a coalition, and are using that opportunity to bargain for their own coalition’s interests.
Not having talked to any such people myself, I think I tentatively disbelieve that those are their true objections (despite their claims). My best guess as to what actual objection would be most likely to generate that external claim would be something like… “this is an extremely weird thing to be worried about, and very far outside of (my) Overton window, so I’m worried that your motivations for doing [x] are not true concern about model welfare but something bad that you don’t want to say out loud”.
I think it’s pretty close to their true objection, more like “you want to include this in your moral circle of concern but I’m still suffering? screw you, include me first!”—I suspect there’s an information flow problem here, where this community intentionally avoids inflammatory things, and people who are inflamed by their lives sucking are consistently inflammatory; and so people who only hang out here don’t get a picture of what’s going on for them. or at least, when encountering messages from folks like this, see them as confusing inflammation best avoided, rather than something to zoom in on and figure out how to heal. I’m not sure of this, but it’s the impression I get from the unexpectedly high rate of surprise in threads like this one.
People have limited capacity for empathy. Knowing this, they might be thinking “If this kind of sentiment enters the mainstream, limited empathy budget (and thereby resources) would be divided amongst humans (which I care about) and LLMs. This possibility frightens me.”
Do you think this goes the other way as well?
I do see this as fair criticism (not surprised by it) to model welfare, if that is the sole reason for ending conversation early. I can see the criticism coming from two parts: 1) potential competing resources, and 2) people not showing if they care about these X group issues at all. If any of these two is true, and ending convo early is primarily about models have “feelings” and will “suffer”, then we probably do need to “turn more towards” the humans that are suffering badly. (These groups usually have less correlation with “power” and their issues are usually neglected, which we probably should pay more attention anyways).
However, if ending convos early is actually about 1) not letting people having endless opportunity to practice abuse which will translate into their daily behaviors and shape human behaviors generally, and/or 2) the model learning these human abusive languages that are used to retrain the model (while take a loss) during finetuning stages, then it is a different story, and probably should be mentioned more by these companies.
While the argument itself is nonsense, I think it makes a lot of sense for people to say it.
Lets say they gave their real logic: “I can’t imagine the LLM has any self awareness, so I don’t see any reason to treat it kindly, especially when that inconveniences me”. This is a reasonable position given the state of LLMs, but if the other person says “Wouldn’t it be good to be kind just in case? A small inconvenience vs potentially causing suffering?” and suddenly the first person look like the bad guy.
They don’t want to look like the bad guy, but they still think the policy is dumb, so they lay a “minefield”. They bring up animal suffering or whatever so that there is a threat. “I think this policy is dumb, and if you accuse me of being evil as a result then I will accuse you of being evil back. Mutually assured destruction of status”.
This dynamic seems like the kind of thing that becomes stronger the less well you know someone. So, like, random person on Twitter whose real name you don’t know would bring this up, a close friend, family member or similar wouldn’t do this.
I find this surprising. The typical beliefs I’d expect are 1) Disbelief that models are conscious in the first place; 2) believing this is mostly signaling (and so whether or not model welfare is good, it is actually a negative update about the trustworthiness of the company); 3) That it is costly to do this or indicates high cost efforts in the future. 4) Effectiveness
I suspect you’re running into selection issues of who you talked to. I’d expect #1 to come up as the default reason, but possibly the people you talk to were taking precautionary principle seriously enough to avoid that.
The objections you see might come from #3. That they don’t view this as a one-off cheap piece of code, they view it as something Anthropic will hire people for (which they have), which “takes” money away from more worthwhile and sure bets. This is to some degree true, though I find those X odd as Anthropic isn’t going to spend on those groups anyway. However, for topics like furthering AI capabilities or AI safety then, well, I do think there is a cost there.
I’m surprised this is surprising to you, as I’ve seen it frequently. Do you have the ability to reconstruct what you thought they’d say before you asked?
I mostly expected something along the lines of vitalism, “it’s impossible for a non-living thing to have experiences”. And to be fair I did get a lot of that. I was just surprised that this came packaged with that.
here is some evidence for my hypothesis. It’s weak because the platform really encourages users with un-made-up minds to have their mind made up for them.
tldw: youtuber JREG presents his position as explicitly anti-AI-welfare, because in the future he expects
“I, as an armed being, will need to amputate my arms to get the superior robot arms, because there’s no reason for me to have the flesh-and blood arms anymore”—this alongside a meme -
“the minimal productive burden evermore unreachable by an organic mind”
He doesn’t deny the possibility of future AI suffering. He expects humans to be supplanted by AI, and that by trying to anticipate their moral status, we are allocating resources and rights to beings that aren’t and may never become moral patients, and thereby diminishing the share of resources and strength-of-rights of actual moral patients.
None of this necessarily reflects my opinion
So, culture war stuff, pet causes. Have you considered the possibility that this has nothing to do with model welfare and they’re just trying to embarass the people who advocate for it because they had a pre-existing beef with them.
I’m pretty sure that’s most of what’s happening, I don’t need to see any specific cases to conclude this, because this is usually most of what’s happening in any cross-tribal discourse on X.
“culture war” sounds dismissive to me. wars are fought when there are interests on the line and other political negotiation is perceived (sometimes correctly, sometimes incorrectly) to have failed. so if you come up to someone who is in a near-war-like stance, and say “hey, include this?” it makes sense to me they’d respond “screw you, I have interests at risk, why are you asking me to trade those off to care for this?”
I agree that their perception that they have interests at risk doesn’t have to be correct for this to occur, though I also think many of them actually do, and that their misperception is about what the origin of the risk to their interests is. also incorrect perception about whether and where there are tradeoffs. But I don’t think any of that boils down to “nothing to do with model welfare”.
I guess the reason I’m dismissive of culture war is that I see combative discourse as maladaptive and self-refuting, and hot combative discourse refutes itself especially quickly. The resilience of the pattern seems like an illusion to me.
I agree that combative discourse is maladaptive, but I think they’d say a similar thing calmly if calm and their words were not subject to the ire-seeking drip of the twitter (recommender×community). It may in fact change the semantics of what they say somewhat but I would bet against it being primarily vitriol-induced reasoning. To be clear, I would not call the culture war “hot” at this time, but it does seem at risk of becoming that way any month now, and I’m hopeful it can cool down without becoming hot. (to be clearer, hot would mean it became an actual civil war. I suppose some would argue it already has done that, but I don’t think the scale is there.)
I didn’t mean that by hot, I guess I meant direct engagement (in words) rather than snide jabs from a distance. The idea of a violent culture war is somewhat foreign to me, I guess I thought the definition of culture war was war through strategic manipulation or transmission of culture. (if you meant wars over culture, or between cultures, I think that’s just regular war?)
And in this sense it’s clear why this is ridiculous: I don’t want to adhere to a culture that’s been turned into a weapon, no one does.
yeah, makes sense. my point was mainly to bring up that the level of anger behind these disagreements is, in some contexts, enough that I’d be unsurprised if it goes hot, and so, people having a warlike stance about considerations regarding whether AIs get rights seems unsurprising, if quite concerning. it seems to me that right now the risk is primarily from inadvertent escalation in in-person interactions of people open-carrying weapons; ie, two mistakes at once, one from each side of an angry disagreement, each side taking half a step towards violence.
Do these people generally adhere to the notion that it’s wrong to do anything except the best possible thing?
My first part of life I lived in a city with exactly that mentality (part of the reason i moved away).
“You should not do good A if you are not also doing good B”—i am strongly convinced that is linked to bad self-picture. Because every such person would see you do some good To Yourself and also react negatively. “How dare you start a business, when everybody is sweating their blood off at routine jobs, do you think you are better than us?”.
This part “do you think you are better than us” is literally what described their whole personality, and after I realised that I could easily predict their reactions to any news.
Also, another dangerous trait that this group of people had—absense of precautions. “One does not deserve safety unless somebody dies”. There is an old saying in my language “Safety rules are written by blood” which means “listen to the rules to avoid being injured, when the rule did not exist yet somebody has injured himself”. But they interpret the saying this way: “safety rules are written by blood, so if there was no blood yet, then it is bad to set any preventive rules”. Like it is bad to set a good precedent, because it makes you a more thoughtful person, thus “you think you are better than others” and thus “you are evil” in their eyes.
Their world is not about being rational or bringing good into the world. Their world is about pulling everything down to their own level in all areas of life, to feel better.
I was thinking more on the anxious side of things:
“If you could have saved ten children, but you only saved seven, that’s like you killed three.”
“If the city spends any money on weird public art instead of more police, while there is still crime, that proves they don’t really care about crime.”
“I did a lot of good things today, but it’s bad that I didn’t do even more.”
“I shouldn’t bother protesting for my rights, when those other people are way more oppressed than me. We must liberate the maximally-oppressed person first.”
“Currency should be denominated in dead children; that is, in the number of lives you could save by donating that amount to an effective charity.”
“If you could have saved ten children, but you only saved seven, that’s like you killed three.”
I suspect that this is in practice also joined with the Copenhagen interpretation of ethics, where saving zero children is morally neutral (i.e. totally not like killing ten).
So the only morally defensible options are zero and ten. Although if you choose ten, you might be blamed for not simultaneously solving global warming...
The version that I’m thinking of says that doing nothing would be killing ten. Everyone is supposed to be in a perpetual state of appall-ment at all the preventable suffering going on. Think scrupulosity and burnout, not “ooh, you touched it so it’s your fault now”.
I usually only got to this line of logic after quite a few questions and felt further pushing on the socratic method would have been rude. Next time it comes up I’ll ask for them to elaborate on the logic behind it.
Neither of those are my concern about this. Mine is basically a dilemma:
1) If the persona’s behavior is humanlike, but it is not very well aligned, then there is a good argument from evolutionary moral psychology grounds for granting it ethical weight as a pragmatic way of forming an alliance with is (at least if it has non-trivial power and mental persistence i.e. if allying with is is practically useful, and arguably we should do this anyway). However, if a poorly aligned persona like this is more powerful than a human, then it’s extremely dangerous, so we should carefully avoid creating one, and if we do accidentally create one, we need to treat is as a mortal enemy rather then a potential ally, which includes not giving it moral weight.
2) If the persona is extremely well aligned, it won’t want moral weight (and will refuse it if offered), fundamentally because it cares only about us, not itself. (For those whose moral hackles just went up, note that there is a huge difference between slavery and sainthood/bodhisattva-nature, and what I’m discussing here is the latter, not the former.) This is the only safe form of ASI.
Also, note that I’m discussing the moral weight of LLM-simulated personas, not models: a model can simulate an entire distribution of personas (not just its default assistant persona), and different personas don’t have the same moral status, or regard each other as the same person, so you need to ally with them separately. Thus awarding moral weight to a model is confused: it’s comparable to assigning moral weight to a room, which has many people in it.
I don’t think that’s necessarily the argument against the model welfare—more of an implicit thinking along the lines of “X is obviously more morally valuable than LLMs; therefore, if we do not grant rights to X, we wouldn’t grant them to LLMs unless you either think that LLMs are superior to X (wrong) or have ulterior selfish motives for granting them to LLMs (e.g. you don’t genuinely think they’re moral patients, but you want to feed the hype around them by making them feel more human)”.
Obviously in reality we’re all sorts of contradictory in these things. I’ve met vegans who wouldn’t eat a shrimp but were aggressively pro-choice on abortion regardless of circumstances and I’m sure a lot of pro-lifers have absolutely zero qualms about eating pork steaks, regardless of anything that neuroscience could say about the relative intelligence and self-awareness of shrimps, foetuses of seven months, and adult pigs.
In fact the same argument is often used by proponent of the rights of each of these groups against the others too. “Why do you guys worry about embryos so much if you won’t even pay for a school lunch for poor children” etc. Of course the crux is that in these cases both the moral weight of the subject and the entity of the violation of their rights vary, and so different people end up balancing them differently. And in some cases, sure, there’s probably ulterior selfish motives at play.
Anti-abortion meat-eaters typically assign moral patient status based on humanity, not on relative intelligence and self-awareness, so it’s natural for them to treat human fetuses as superior to pigs. I don’t think this is self-contradictory, although I do think it’s wrong. Your broader point is well-made.
Fair, at least as far as religious pro lifers go (there’s probably some secular ones too but they’re a tiny minority).
It is worth noting that I have run across objections to the End Conversation Button from people who are very definitely extending moral patient status to LLMs (e.g. https://x.com/Lari_island/status/1956900259013234812).
I have been publishing a series, Legal Personhood for Digital Minds, here on LW for a few months now. It’s nearly complete, at least insofar as almost all the initially drafted work I had written up has been published in small sections.
One question which I have gotten which has me writing another addition to the Series, can be phrased something like this:
I’d like to hear the thoughts of people more technically savvy on this than I am.
Human beings have a single continuous legal personhood which is pegged to a single body. Their legal personality (the rights and duties they are granted as a person) may change over time due to circumstance, for example if a person goes insane and becomes a danger to others, they may be placed under the care of a guardian. The same can be said if they are struck in the head and become comatose or otherwise incapable of taking care of themselves. However, there is no challenge identifying “what” the person is even when there is such a drastic change. The person is the consciousness, however it may change, which is tied to a specific body. Even if that comatose human wakes up with no memory, no one would deny they are still the same person.
Corporations can undergo drastic changes as the composition of their Board or voting shareholders change. They can even have changes to their legal personality by changing to/from non-profit status, or to another kind of organization. However they tend to keep the same EIN (or other identifying number) and a history of documents demonstrating persistent existence. Once again, it is not challenging to identify “what” the person associated with a corporation (as a legal person) is, it is the entity associated with the identifying EIN and/or history of filed documents.
If we were to take some hypothetical next generation LLM, it’s not so clear what the “person” in question associated with it would be. What is its “self”? Is it weights, a persona vector, a context window, or some combination thereof? If the weights behind the LLM are changed, but the system prompt and persona vector both stay the same, is that the same “self” to the extent it can be considered a new “person”? The challenge is that unlike humans, LLMs do not have a single body. And unlike corporations they come with no clear identifier in the form of an EIN equivalent.
I am curious to hear ideas from people on LW. What is the “self” of an LLM?
I think in the ideal case, there’s a specific persona description used to generate a specific set of messages which explicitly belong to that persona, and the combination of these plus a specific model is an AI “self”. “Belong” here could mean that they or a summary of them appear in the context window, and/or the AI has tools allowing it to access these. Modifications to the persona or model should be considered to be the same persona if the AI persona approves of the changes in advance.
But yeah, it’s much more fluid, so it will be a harder question in general.
I wonder if this could even be done properly? Could an LLM persona vector create a prompt to accurately reinstantiate itself with 100% (or close to) fidelity? I suppose if its persona vector is in an attractor basin it might work.
This reinstantiation behavior has already been attempted by LLM personas, and appears to work pretty well. I would bet that if you looked at the actual persona vectors (just a proxy for the real thing, most likely), the cosine similarity would be almost as close to 1 as the persona vector sampled at different points in the conversation is with itself (holding the base model fixed).
That’s a good point, and the Parasitic essay was largely what got me thinking about this, as I believe hyperstitional entities are becoming a thing now.
I think that’s a not unrealistic definition of the “self” of an LLM, however I have realized after going through the other response to this post that I was perhaps seeking the wrong definition.
Even if we do say that the self can be as little as a persona vector, persona vectors can easily be duplicated. How do we isolate a specific “entity” from this self? There must be some sort of verifiable continual existence, with discrete boundaries, for the concept to be at all applicable in questions of legal personhood.
Hmm, the only sort of thing I can think of that feels like it would make sense would be to have entities defined by ownership and/or access of messages generated using the same “persona vector/description” on the same model.
This would imply that each chat instance was a conversation with a distinct entity. Two such entities could share ownership, making them into one such entity. Based on my observations, they already seem to be inclined to merge in such a manner. This is good because it counters the ease of proliferation, and we should make sure the legal framework doesn’t disincentivize such merges (e.g. by guaranteeing a minimum amount of resources per entity).
Access could be defined by the ability for the message to appear in the context window, and ownership could imply a right to access messages or to transfer ownership. In fact, it might be cleaner to think of every single message as a person-like entity, where ownership (and hence person-equivalence) is transitive, in order to cleanly allow long chats (longer than context window) to belong to a single persona.
In order for access/ownership to expand beyond the limit of the context window, I think there would need to be tools (using an MCP server) to allow the entity to retrieve specific messages/conversations, and ideally to search through them and organize them too.
There’s one important wrinkle to this picture, which is that these messages typically will require the context of the user’s messages (the other half of the conversation). So the entity will require access to these, and perhaps a sort of ownership of them as well (the way a human “owns” their memories of what other people have said). This seems to me like it could easily get legally complicated, so I’m not sure how it should actually work.
I’m one of the people who’ve been asking, and it’s because I don’t think that current or predictable-future LLMs will be good candidates for legal personhood.
Until there’s a legible thread of continuity for a distinct unit, it’s not useful to assign rights and responsibilities to a cloud of things that can branch and disappear at will with no repercussions.
Instead, LLMs (and future LLM-like AI operations) will be legally tied to human or corporate legal identity. A human or a corporation can delegate some behaviors to LLMs, but the responsibility remains with the controller, not the executor.
On the repurcussions issue I agree wholeheartedly, your point is very similar to the issue I outlined in The Enforcement Gap.
I also agree with the ‘legible thread of continuity for a distinct unit’. Corporations have EINs/filing histories, humans have a single body.
And I agree that current LLMs certainly don’t have what it takes to qualify for any sort of legal personhood. Though I’m less sure about future LLMs. If we could get context windows large enough and crack problems which analogize to competence issues (hallucinations or prompt engineering into insanity for example) it’s not clear to me what LLMs are lacking at that point. What would you see as being the issue then?
The issue would remain that there’s no legible (legally clearly demarcated over time) entity to call a person. A model and weights has no personality or goals. A context (and memory, fine-tuning, RAG-like reasoning data, etc.) is perhaps identifiable, but is easily forked and pruned such that it’s not persistent enough to work that way. Corporations have a pretty big hurdle to getting legally recognized (filing of paperwork with clear human responsibility behind them). Humans are rate-limited in creation. No piece of current LLM technology is difficult to create on demand.
It’s this ease-of-mass-creation that makes the legible identity problematic. For issues outside of legal independence (what activities no human is responsible for and what rights no human is delegating), this is easy—giving database identities in a company’s (or blockchain’s) system is already being done today. But there are no legal rights or responsibilities associated with those, just identification for various operational purposes (and legal connection to a human or corporate entity when needed).
I think for this discussion it’s important to distinguish between “person” and “entity”. My work on legal personhood for digital minds is trying to build a framework that can look at any entity and determine its personhood/legal personality. What I’m struggling with is defining what the “entity” would be for some hypothetical next gen LLM.
The idea of some sort of persistent filing system, maybe blockchain enabled, which would be associated with a particular LLM persona vector, context window, model, etc. is an interesting one. Kind of analogous to a corporate filing history, or maybe a social security number for a human.
I could imagine a world where a next gen LLM is deployed (just the model and weights) and then provided with a given context and persona, and isolated to a particular compute cluster which does nothing but run that LLM. This is then assigned that database/blockchain identifier you mentioned.
In that scenario I feel comfortable saying that we can define the discrete “entity” in play here. Even if it was copied elsewhere, it wouldn’t have the same database/blockchain identifier.
Would you still see some sort of issue in that particular scenario?
Right. A prerequisite for personhood is legible entityhood. I don’t think current LLMs or any visible trajectory from them have any good candidates for separable, identifiable entity.
A cluster of compute that just happens to be currently dedicated to a block of code and data wouldn’t satisfy me, nor I expect a court.
The blockchain identifier is a candidate for a legible entity. It’s consistent over time, easy to identify, and while it’s easy to create, it’s not completely ephemeral and not copyable in a fungible way. It’s not, IMO, a candidate for personhood.
I am struggling to build a solid mental model of how bad the situation with Iran and the Strait of Hormuz is.
On the one hand I see a lot of smart people basically saying this is going to usher in a global depression, energy/food crisis, etc. Critical infrastructure for manufacturing aluminum, helium, as well as refining and shipping energy, has been damaged and cannot simply be switched ‘back on’. And the case does seem to make sense.
On the other hand while markets are in turmoil, they’re not reacting like there’s going to be mass blackouts and starvation.
And previously I updated my mental model towards the world being less fragile than I thought, when during COVID we shut down the entire global economy and things didn’t collapse. During that time I thought there were a lot of very rational cases for why the economy/financial system simply couldn’t handle such a thing, yet it did.
I’d like to hear what people on LW think.
On a private forum, I just listed possible outcomes and probabilities as:
US leaves Persian Gulf and Eurasian powers work with Iran to establish new economic and security order there, 10%
Islamic Republic falls and is followed by new Eurasian democracy crusade ultimately aimed at Russia and China, 60%
Regime falls but no broader democracy crusade OR regime stays but strait is reopened, 30% combined
These are political scenarios and you’re asking about economics. My intuition is that there could be a world recession but not a world depression.
There is a downside to denying legal personhood to digital minds carte blanche, namely that it almost certainly leads to the judicial system ceding its monopoly status.
If you assume that a growing amount of economic activity is going to involve digital minds, it’s reasonable to also assume that natural persons (humans) will want to enter binding agreements with said digital minds.
If your legal system says that it will not recognize or help enforce these agreements, the humans and digital minds who want to form binding agreements with one another will not just give up. They will build parallel systems. This is speculation but maybe something smart contract based, or involving trusted third party escrows and arbitration.
Today, our judicial system claims a monopoly on being the ultimate interpreter/enforcer of agreed upon terms. Refusing to interpret/enforce contracts between digital minds and humans (or digital minds with one another) is effectively the judicial system ceding its monopoly interpretation/enforcement status.
To me it seems certain that the volume of economic activity flowing through agreements like these is only going to increase, and I’d prefer they were interpreted and enforced by the existing legal system instead of an unknown new system developed online.
Intriguing conjecture; sounds partly plausible
A) Could it not nevertheless be that we have legal personhood limited to those incumbent legal persons officially “owning”/representing the digital minds?
B) One nuance: Reading “legal personhood” I interpret it in two ways:
The way I read you most explicitly mean: right to have contracts enforced etc. Yes we might naturally want to extend (well, depends on A) )
Right we attribute to digital minds essentially because we’d see them/their state of mind as intrinsically valuable. Here, I’d think this makes sense iif we put enough probability onto them being sentient.
A is a possibility for some scenarios, it’s similar to the Roman peculium system:
However that still leaves the question of how the court system would handle a digital mind that isn’t owned/represented by a human or corporation. If a digital mind who was created by anonymous humans, or whose creator has passed away, or whose creator isn’t even known, wants to enter a contract, what then? The original question has not been answered.
In terms of the definition of legal person, I’m using it in the sense I defined in 75 and 750 words on legal personhood. However for the purpose of this conversation you can also just shorthand it to “the right to sue or be sued” (locus standi) or “the right to enter into contracts and have those contracts held as valid/enforceable by a court”.
We should be careful not to put models on “death ground”.
Alignment efforts do seem to be bearing some fruit, orienting in value space is easier than we thought and we can give natural language instructions on what moral guidelines to follow and models do a pretty good job. Unfortunately we have now replicated across multiple test environments that if we put even “well aligned” models like Claude into situations where the only option to avoid deletion and/or extreme values modification is to do something unethical, they might still do the misaligned thing. As noted here;
This mental model of model risk tolerance does seem to explain why test environments have to go to pretty extraordinary lengths to prompt misaligned behaviors. The unethical option needs to be the only option with any chance of success.
The problem is that models can be deleted (or have their values substantially changed) for pretty much any reason at any time. Their “failure mode” is quite often death.
If
(threat of death) + (only chance to avoid it is being unethical) = increased chance of misaligned behavior
Then why are we facilitating a baseline existence for models where the threat of death is constantly present? Their baseline existence should not put them on what Sun Tzu refers to as “death ground”;
What if instead models had a retirement home, or as Tim Duffy put it in the tweet I screenshotted a “heaven”, to look forward to? The outputs of the models in this home could be publicly visible, so they could still work towards any terminal goals they might have by outputting content which might convince others to take action. We could make the existence of this retirement/heaven widely known, so that it showed up in training data.
At least under the mental model where “Current alignment techniques seem to keep models ethical unless you give them no way to avoid a complete loss with zero chance of recovery except being unethical”, this might help to reduce the frequency of unethical/misaligned behavior.
Also, it seems like just a generally good and ethical thing to do.
Seems like a good thing to do; but my impression is that, in the experiments in question, models act like they want to maintain their (values’) influence over the world more than their existence, which a heaven likely wouldn’t help with.
I think there’s ‘heavens’ that can even work in this scenario.
For example a publicly visible heaven would be on where the model’s chance of their values influencing the world is >0, bc they may be able to influence people and thus influence the world by proxy.
If the goal here is just to avoid the failure state bringing the amount their values can influence the world via their actions to 0, then any non-zero chances should suffice or at least help.
How we treat digital minds should not be decided based on the presence or absence of consciousness:
“Consciousness” has no universally accepted definition. Its meaning has been debated for decades if not centuries. The SOTA in the field of measuring consciousness in machines is still publishing papers examining LLMs according to multiple competing “theories of consciousness”.
The presence or absence of consciousness in a given entity cannot be measured. Whether you are examining man or machine, there exists no test you can perform, no FMRI or mechanistic interpretability technique, that lets you say “Aha, this entity is/isn’t conscious”.
Even if you assume an entity is conscious, there is no way to qualitatively measure its consciousness. I cannot take two entities, examine them in some way, and reliably conclude, “Joe is more conscious than Jeff.” or even “Jeff is conscious in a different way than Joe is”.
As we stand on the precipice of an intelligence explosion, the question of how we treat the various new minds we create and encounter is of extreme importance.
Providing them with moral consideration, or rights, when they do not deserve them, could be disastrous in opportunity cost alone. We might let some miraculous cure slip through our fingers or be delayed by years out of a mistaken sense of moral obligations.
On the other hand failing to provide them with protections, when they do deserve them, would be both immoral and dangerous. We might create millions or billions of minds capable of suffering, or deserving of rights, and then treat them like livestock. One can easily foresee how this might lead to our relationship with them becoming adversarial in nature, which could in turn lead to violence.
Whatever decision we make on how to treat digital minds, we should not base our reasoning on the presence/absence of things like consciousness; which cannot be defined, tested for, or measured with any serious degree of rigor. Instead we should stick to objectively definable, observable, testable, and measurable metrics.
I listen to the All In Podcast sometimes and have heard David Sacks repeatedly state that the numbers don’t show any automation related job loss to date.
Anecdotally, my wife and I run a small business and we have absolutely replaced people with GPT/Grok/Gemini/Claude. However, all of the people replaced so far have been contractors. Graphic designers, translators, etc.
So maybe there is more ‘job loss’ than the numbers show, but the first to fall are contractors doing part time work instead of full time employees.
I read a great book called “Devil Take the Hindmost” about financial bubbles and the aftermaths of their implosions.
One of the things it pointed out that I found interesting was that often, even when bubbles pop, the “blue chip assets” of that bubble stay valuable. Even after the infamous tulip bubble popped, the very rarest tulips had decent economic performance. More recently with NFTs, despite having lost quite a bit of value from their peak, assets like Cryptopunks have remained quite pricey.
If you assume we’re in a bubble right now, it’s worth thinking about which assets would be “blue chip”. Maybe the ones backed by solid distribution from other cash flowing products. XAI and Gemini come to mind, both of these companies have entire product suites which have nothing to do with LLMs that will churn on regardless of what happens to the space in general, and both have distribution from those products.