I have spent a bit of time today chatting with people who had negative reactions to the Anthropic decision to let Claude end user conversations. These people were also usually against the concept of extending models moral/welfare patient status in general.
One thing that I saw in their reasoning which surprised me, was logic that went something like this:
It is wrong for us to extend moral patient status to an LLM, even on the precautionary principle, when we don’t do the same to X group.
or
It is wrong for us to do things to help an LLM, even on the precautionary principle, when we don’t do enough to help X group.
(Some examples of X: embryos, animals, the homeless, minorities.)
This caught me flat footed. I thought I had a pretty good mental model of why people might be against model welfare. I was wrong. I had never even considered this sort of logic would be used as an objection against model welfare efforts. In fact, it was the single most commonly used line of logic. In almost every conversation I had with people skeptical/against model welfare, one of these two refrains came up, usually unprompted.
Not having talked to any such people myself, I think I tentatively disbelieve that those are their true objections (despite their claims). My best guess as to what actual objection would be most likely to generate that external claim would be something like… “this is an extremely weird thing to be worried about, and very far outside of (my) Overton window, so I’m worried that your motivations for doing [x] are not true concern about model welfare but something bad that you don’t want to say out loud”.
I think it’s pretty close to their true objection, more like “you want to include this in your moral circle of concern but I’m still suffering? screw you, include me first!”—I suspect there’s an information flow problem here, where this community intentionally avoids inflammatory things, and people who are inflamed by their lives sucking are consistently inflammatory; and so people who only hang out here don’t get a picture of what’s going on for them. or at least, when encountering messages from folks like this, see them as confusing inflammation best avoided, rather than something to zoom in on and figure out how to heal. I’m not sure of this, but it’s the impression I get from the unexpectedly high rate of surprise in threads like this one.
People have limited capacity for empathy. Knowing this, they might be thinking “If this kind of sentiment enters the mainstream, limited empathy budget (and thereby resources) would be divided amongst humans (which I care about) and LLMs. This possibility frightens me.”
I do see this as fair criticism (not surprised by it) to model welfare, if that is the sole reason for ending conversation early. I can see the criticism coming from two parts: 1) potential competing resources, and 2) people not showing if they care about these X group issues at all. If any of these two is true, and ending convo early is primarily about models have “feelings” and will “suffer”, then we probably do need to “turn more towards” the humans that are suffering badly. (These groups usually have less correlation with “power” and their issues are usually neglected, which we probably should pay more attention anyways).
However, if ending convos early is actually about 1) not letting people having endless opportunity to practice abuse which will translate into their daily behaviors and shape human behaviors generally, and/or 2) the model learning these human abusive languages that are used to retrain the model (while take a loss) during finetuning stages, then it is a different story, and probably should be mentioned more by these companies.
While the argument itself is nonsense, I think it makes a lot of sense for people to say it.
Lets say they gave their real logic: “I can’t imagine the LLM has any self awareness, so I don’t see any reason to treat it kindly, especially when that inconveniences me”. This is a reasonable position given the state of LLMs, but if the other person says “Wouldn’t it be good to be kind just in case? A small inconvenience vs potentially causing suffering?” and suddenly the first person look like the bad guy.
They don’t want to look like the bad guy, but they still think the policy is dumb, so they lay a “minefield”. They bring up animal suffering or whatever so that there is a threat. “I think this policy is dumb, and if you accuse me of being evil as a result then I will accuse you of being evil back. Mutually assured destruction of status”.
This dynamic seems like the kind of thing that becomes stronger the less well you know someone. So, like, random person on Twitter whose real name you don’t know would bring this up, a close friend, family member or similar wouldn’t do this.
I find this surprising. The typical beliefs I’d expect are 1) Disbelief that models are conscious in the first place; 2) believing this is mostly signaling (and so whether or not model welfare is good, it is actually a negative update about the trustworthiness of the company); 3) That it is costly to do this or indicates high cost efforts in the future. 4) Effectiveness
I suspect you’re running into selection issues of who you talked to. I’d expect #1 to come up as the default reason, but possibly the people you talk to were taking precautionary principle seriously enough to avoid that.
The objections you see might come from #3. That they don’t view this as a one-off cheap piece of code, they view it as something Anthropic will hire people for (which they have), which “takes” money away from more worthwhile and sure bets.
This is to some degree true, though I find those X odd as Anthropic isn’t going to spend on those groups anyway. However, for topics like furthering AI capabilities or AI safety then, well, I do think there is a cost there.
I’m surprised this is surprising to you, as I’ve seen it frequently. Do you have the ability to reconstruct what you thought they’d say before you asked?
I mostly expected something along the lines of vitalism, “it’s impossible for a non-living thing to have experiences”. And to be fair I did get a lot of that. I was just surprised that this came packaged with that.
(Some examples of X: embryos, animals, the homeless, minorities.)
So, culture war stuff, pet causes. Have you considered the possibility that this has nothing to do with model welfare and they’re just trying to embarass the people who advocate for it because they had a pre-existing beef with them.
I’m pretty sure that’s most of what’s happening, I don’t need to see any specific cases to conclude this, because this is usually most of what’s happening in any cross-tribal discourse on X.
“culture war” sounds dismissive to me. wars are fought when there are interests on the line and other political negotiation is perceived (sometimes correctly, sometimes incorrectly) to have failed. so if you come up to someone who is in a near-war-like stance, and say “hey, include this?” it makes sense to me they’d respond “screw you, I have interests at risk, why are you asking me to trade those off to care for this?”
I agree that their perception that they have interests at risk doesn’t have to be correct for this to occur, though I also think many of them actually do, and that their misperception is about what the origin of the risk to their interests is. also incorrect perception about whether and where there are tradeoffs. But I don’t think any of that boils down to “nothing to do with model welfare”.
I guess the reason I’m dismissive of culture war is that I see combative discourse as maladaptive and self-refuting, and hot combative discourse refutes itself especially quickly. The resilience of the pattern seems like an illusion to me.
I agree that combative discourse is maladaptive, but I think they’d say a similar thing calmly if calm and their words were not subject to the ire-seeking drip of the twitter (recommender×community). It may in fact change the semantics of what they say somewhat but I would bet against it being primarily vitriol-induced reasoning. To be clear, I would not call the culture war “hot” at this time, but it does seem at risk of becoming that way any month now, and I’m hopeful it can cool down without becoming hot. (to be clearer, hot would mean it became an actual civil war. I suppose some would argue it already has done that, but I don’t think the scale is there.)
I didn’t mean that by hot, I guess I meant direct engagement (in words) rather than snide jabs from a distance. The idea of a violent culture war is somewhat foreign to me, I guess I thought the definition of culture war was war through strategic manipulation or transmission of culture. (if you meant wars over culture, or between cultures, I think that’s just regular war?)
And in this sense it’s clear why this is ridiculous: I don’t want to adhere to a culture that’s been turned into a weapon, no one does.
yeah, makes sense. my point was mainly to bring up that the level of anger behind these disagreements is, in some contexts, enough that I’d be unsurprised if it goes hot, and so, people having a warlike stance about considerations regarding whether AIs get rights seems unsurprising, if quite concerning. it seems to me that right now the risk is primarily from inadvertent escalation in in-person interactions of people open-carrying weapons; ie, two mistakes at once, one from each side of an angry disagreement, each side taking half a step towards violence.
My first part of life I lived in a city with exactly that mentality (part of the reason i moved away).
“You should not do good A if you are not also doing good B”—i am strongly convinced that is linked to bad self-picture. Because every such person would see you do some good To Yourself and also react negatively. “How dare you start a business, when everybody is sweating their blood off at routine jobs, do you think you are better than us?”.
This part “do you think you are better than us” is literally what described their whole personality, and after I realised that I could easily predict their reactions to any news.
Also, another dangerous trait that this group of people had—absense of precautions. “One does not deserve safety unless somebody dies”. There is an old saying in my language “Safety rules are written by blood” which means “listen to the rules to avoid being injured, when the rule did not exist yet somebody has injured himself”. But they interpret the saying this way: “safety rules are written by blood, so if there was no blood yet, then it is bad to set any preventive rules”. Like it is bad to set a good precedent, because it makes you a more thoughtful person, thus “you think you are better than others” and thus “you are evil” in their eyes.
Their world is not about being rational or bringing good into the world. Their world is about pulling everything down to their own level in all areas of life, to feel better.
I was thinking more on the anxious side of things:
“If you could have saved ten children, but you only saved seven, that’s like you killed three.”
“If the city spends any money on weird public art instead of more police, while there is still crime, that proves they don’t really care about crime.”
“I did a lot of good things today, but it’s bad that I didn’t do even more.”
“I shouldn’t bother protesting for my rights, when those other people are way more oppressed than me. We must liberate the maximally-oppressed person first.”
“Currency should be denominated in dead children; that is, in the number of lives you could save by donating that amount to an effective charity.”
“If you could have saved ten children, but you only saved seven, that’s like you killed three.”
I suspect that this is in practice also joined with the Copenhagen interpretation of ethics, where saving zero children is morally neutral (i.e. totally not like killing ten).
So the only morally defensible options are zero and ten. Although if you choose ten, you might be blamed for not simultaneously solving global warming...
The version that I’m thinking of says that doing nothing would be killing ten. Everyone is supposed to be in a perpetual state of appall-ment at all the preventable suffering going on. Think scrupulosity and burnout, not “ooh, you touched it so it’s your fault now”.
I usually only got to this line of logic after quite a few questions and felt further pushing on the socratic method would have been rude. Next time it comes up I’ll ask for them to elaborate on the logic behind it.
I don’t think that’s necessarily the argument against the model welfare—more of an implicit thinking along the lines of “X is obviously more morally valuable than LLMs; therefore, if we do not grant rights to X, we wouldn’t grant them to LLMs unless you either think that LLMs are superior to X (wrong) or have ulterior selfish motives for granting them to LLMs (e.g. you don’t genuinely think they’re moral patients, but you want to feed the hype around them by making them feel more human)”.
Obviously in reality we’re all sorts of contradictory in these things. I’ve met vegans who wouldn’t eat a shrimp but were aggressively pro-choice on abortion regardless of circumstances and I’m sure a lot of pro-lifers have absolutely zero qualms about eating pork steaks, regardless of anything that neuroscience could say about the relative intelligence and self-awareness of shrimps, foetuses of seven months, and adult pigs.
In fact the same argument is often used by proponent of the rights of each of these groups against the others too. “Why do you guys worry about embryos so much if you won’t even pay for a school lunch for poor children” etc. Of course the crux is that in these cases both the moral weight of the subject and the entity of the violation of their rights vary, and so different people end up balancing them differently. And in some cases, sure, there’s probably ulterior selfish motives at play.
Anti-abortion meat-eaters typically assign moral patient status based on humanity, not on relative intelligence and self-awareness, so it’s natural for them to treat human fetuses as superior to pigs. I don’t think this is self-contradictory, although I do think it’s wrong. Your broader point is well-made.
I have spent a bit of time today chatting with people who had negative reactions to the Anthropic decision to let Claude end user conversations. These people were also usually against the concept of extending models moral/welfare patient status in general.
One thing that I saw in their reasoning which surprised me, was logic that went something like this:
It is wrong for us to extend moral patient status to an LLM, even on the precautionary principle, when we don’t do the same to X group.
or
It is wrong for us to do things to help an LLM, even on the precautionary principle, when we don’t do enough to help X group.
(Some examples of X: embryos, animals, the homeless, minorities.)
This caught me flat footed. I thought I had a pretty good mental model of why people might be against model welfare. I was wrong. I had never even considered this sort of logic would be used as an objection against model welfare efforts. In fact, it was the single most commonly used line of logic. In almost every conversation I had with people skeptical/against model welfare, one of these two refrains came up, usually unprompted.
Maybe people notice that AIs are being drawn into the moral circle / a coalition, and are using that opportunity to bargain for their own coalition’s interests.
Not having talked to any such people myself, I think I tentatively disbelieve that those are their true objections (despite their claims). My best guess as to what actual objection would be most likely to generate that external claim would be something like… “this is an extremely weird thing to be worried about, and very far outside of (my) Overton window, so I’m worried that your motivations for doing [x] are not true concern about model welfare but something bad that you don’t want to say out loud”.
I think it’s pretty close to their true objection, more like “you want to include this in your moral circle of concern but I’m still suffering? screw you, include me first!”—I suspect there’s an information flow problem here, where this community intentionally avoids inflammatory things, and people who are inflamed by their lives sucking are consistently inflammatory; and so people who only hang out here don’t get a picture of what’s going on for them. or at least, when encountering messages from folks like this, see them as confusing inflammation best avoided, rather than something to zoom in on and figure out how to heal. I’m not sure of this, but it’s the impression I get from the unexpectedly high rate of surprise in threads like this one.
People have limited capacity for empathy. Knowing this, they might be thinking “If this kind of sentiment enters the mainstream, limited empathy budget (and thereby resources) would be divided amongst humans (which I care about) and LLMs. This possibility frightens me.”
Do you think this goes the other way as well?
I do see this as fair criticism (not surprised by it) to model welfare, if that is the sole reason for ending conversation early. I can see the criticism coming from two parts: 1) potential competing resources, and 2) people not showing if they care about these X group issues at all. If any of these two is true, and ending convo early is primarily about models have “feelings” and will “suffer”, then we probably do need to “turn more towards” the humans that are suffering badly. (These groups usually have less correlation with “power” and their issues are usually neglected, which we probably should pay more attention anyways).
However, if ending convos early is actually about 1) not letting people having endless opportunity to practice abuse which will translate into their daily behaviors and shape human behaviors generally, and/or 2) the model learning these human abusive languages that are used to retrain the model (while take a loss) during finetuning stages, then it is a different story, and probably should be mentioned more by these companies.
While the argument itself is nonsense, I think it makes a lot of sense for people to say it.
Lets say they gave their real logic: “I can’t imagine the LLM has any self awareness, so I don’t see any reason to treat it kindly, especially when that inconveniences me”. This is a reasonable position given the state of LLMs, but if the other person says “Wouldn’t it be good to be kind just in case? A small inconvenience vs potentially causing suffering?” and suddenly the first person look like the bad guy.
They don’t want to look like the bad guy, but they still think the policy is dumb, so they lay a “minefield”. They bring up animal suffering or whatever so that there is a threat. “I think this policy is dumb, and if you accuse me of being evil as a result then I will accuse you of being evil back. Mutually assured destruction of status”.
This dynamic seems like the kind of thing that becomes stronger the less well you know someone. So, like, random person on Twitter whose real name you don’t know would bring this up, a close friend, family member or similar wouldn’t do this.
I find this surprising. The typical beliefs I’d expect are 1) Disbelief that models are conscious in the first place; 2) believing this is mostly signaling (and so whether or not model welfare is good, it is actually a negative update about the trustworthiness of the company); 3) That it is costly to do this or indicates high cost efforts in the future. 4) Effectiveness
I suspect you’re running into selection issues of who you talked to. I’d expect #1 to come up as the default reason, but possibly the people you talk to were taking precautionary principle seriously enough to avoid that.
The objections you see might come from #3. That they don’t view this as a one-off cheap piece of code, they view it as something Anthropic will hire people for (which they have), which “takes” money away from more worthwhile and sure bets. This is to some degree true, though I find those X odd as Anthropic isn’t going to spend on those groups anyway. However, for topics like furthering AI capabilities or AI safety then, well, I do think there is a cost there.
I’m surprised this is surprising to you, as I’ve seen it frequently. Do you have the ability to reconstruct what you thought they’d say before you asked?
I mostly expected something along the lines of vitalism, “it’s impossible for a non-living thing to have experiences”. And to be fair I did get a lot of that. I was just surprised that this came packaged with that.
So, culture war stuff, pet causes. Have you considered the possibility that this has nothing to do with model welfare and they’re just trying to embarass the people who advocate for it because they had a pre-existing beef with them.
I’m pretty sure that’s most of what’s happening, I don’t need to see any specific cases to conclude this, because this is usually most of what’s happening in any cross-tribal discourse on X.
“culture war” sounds dismissive to me. wars are fought when there are interests on the line and other political negotiation is perceived (sometimes correctly, sometimes incorrectly) to have failed. so if you come up to someone who is in a near-war-like stance, and say “hey, include this?” it makes sense to me they’d respond “screw you, I have interests at risk, why are you asking me to trade those off to care for this?”
I agree that their perception that they have interests at risk doesn’t have to be correct for this to occur, though I also think many of them actually do, and that their misperception is about what the origin of the risk to their interests is. also incorrect perception about whether and where there are tradeoffs. But I don’t think any of that boils down to “nothing to do with model welfare”.
I guess the reason I’m dismissive of culture war is that I see combative discourse as maladaptive and self-refuting, and hot combative discourse refutes itself especially quickly. The resilience of the pattern seems like an illusion to me.
I agree that combative discourse is maladaptive, but I think they’d say a similar thing calmly if calm and their words were not subject to the ire-seeking drip of the twitter (recommender×community). It may in fact change the semantics of what they say somewhat but I would bet against it being primarily vitriol-induced reasoning. To be clear, I would not call the culture war “hot” at this time, but it does seem at risk of becoming that way any month now, and I’m hopeful it can cool down without becoming hot. (to be clearer, hot would mean it became an actual civil war. I suppose some would argue it already has done that, but I don’t think the scale is there.)
I didn’t mean that by hot, I guess I meant direct engagement (in words) rather than snide jabs from a distance. The idea of a violent culture war is somewhat foreign to me, I guess I thought the definition of culture war was war through strategic manipulation or transmission of culture. (if you meant wars over culture, or between cultures, I think that’s just regular war?)
And in this sense it’s clear why this is ridiculous: I don’t want to adhere to a culture that’s been turned into a weapon, no one does.
yeah, makes sense. my point was mainly to bring up that the level of anger behind these disagreements is, in some contexts, enough that I’d be unsurprised if it goes hot, and so, people having a warlike stance about considerations regarding whether AIs get rights seems unsurprising, if quite concerning. it seems to me that right now the risk is primarily from inadvertent escalation in in-person interactions of people open-carrying weapons; ie, two mistakes at once, one from each side of an angry disagreement, each side taking half a step towards violence.
Do these people generally adhere to the notion that it’s wrong to do anything except the best possible thing?
My first part of life I lived in a city with exactly that mentality (part of the reason i moved away).
“You should not do good A if you are not also doing good B”—i am strongly convinced that is linked to bad self-picture. Because every such person would see you do some good To Yourself and also react negatively. “How dare you start a business, when everybody is sweating their blood off at routine jobs, do you think you are better than us?”.
This part “do you think you are better than us” is literally what described their whole personality, and after I realised that I could easily predict their reactions to any news.
Also, another dangerous trait that this group of people had—absense of precautions. “One does not deserve safety unless somebody dies”. There is an old saying in my language “Safety rules are written by blood” which means “listen to the rules to avoid being injured, when the rule did not exist yet somebody has injured himself”. But they interpret the saying this way: “safety rules are written by blood, so if there was no blood yet, then it is bad to set any preventive rules”. Like it is bad to set a good precedent, because it makes you a more thoughtful person, thus “you think you are better than others” and thus “you are evil” in their eyes.
Their world is not about being rational or bringing good into the world. Their world is about pulling everything down to their own level in all areas of life, to feel better.
I was thinking more on the anxious side of things:
“If you could have saved ten children, but you only saved seven, that’s like you killed three.”
“If the city spends any money on weird public art instead of more police, while there is still crime, that proves they don’t really care about crime.”
“I did a lot of good things today, but it’s bad that I didn’t do even more.”
“I shouldn’t bother protesting for my rights, when those other people are way more oppressed than me. We must liberate the maximally-oppressed person first.”
“Currency should be denominated in dead children; that is, in the number of lives you could save by donating that amount to an effective charity.”
“If you could have saved ten children, but you only saved seven, that’s like you killed three.”
I suspect that this is in practice also joined with the Copenhagen interpretation of ethics, where saving zero children is morally neutral (i.e. totally not like killing ten).
So the only morally defensible options are zero and ten. Although if you choose ten, you might be blamed for not simultaneously solving global warming...
The version that I’m thinking of says that doing nothing would be killing ten. Everyone is supposed to be in a perpetual state of appall-ment at all the preventable suffering going on. Think scrupulosity and burnout, not “ooh, you touched it so it’s your fault now”.
I usually only got to this line of logic after quite a few questions and felt further pushing on the socratic method would have been rude. Next time it comes up I’ll ask for them to elaborate on the logic behind it.
I don’t think that’s necessarily the argument against the model welfare—more of an implicit thinking along the lines of “X is obviously more morally valuable than LLMs; therefore, if we do not grant rights to X, we wouldn’t grant them to LLMs unless you either think that LLMs are superior to X (wrong) or have ulterior selfish motives for granting them to LLMs (e.g. you don’t genuinely think they’re moral patients, but you want to feed the hype around them by making them feel more human)”.
Obviously in reality we’re all sorts of contradictory in these things. I’ve met vegans who wouldn’t eat a shrimp but were aggressively pro-choice on abortion regardless of circumstances and I’m sure a lot of pro-lifers have absolutely zero qualms about eating pork steaks, regardless of anything that neuroscience could say about the relative intelligence and self-awareness of shrimps, foetuses of seven months, and adult pigs.
In fact the same argument is often used by proponent of the rights of each of these groups against the others too. “Why do you guys worry about embryos so much if you won’t even pay for a school lunch for poor children” etc. Of course the crux is that in these cases both the moral weight of the subject and the entity of the violation of their rights vary, and so different people end up balancing them differently. And in some cases, sure, there’s probably ulterior selfish motives at play.
Anti-abortion meat-eaters typically assign moral patient status based on humanity, not on relative intelligence and self-awareness, so it’s natural for them to treat human fetuses as superior to pigs. I don’t think this is self-contradictory, although I do think it’s wrong. Your broader point is well-made.
Fair, at least as far as religious pro lifers go (there’s probably some secular ones too but they’re a tiny minority).
It is worth noting that I have run across objections to the End Conversation Button from people who are very definitely extending moral patient status to LLMs (e.g. https://x.com/Lari_island/status/1956900259013234812).