I have great empathy and deep respect for the courage of the people currently on hunger strikes to stop the AI race. Yet, I wish they hadn’t started them: these hunger strikes will not work.
Hunger strikes can be incredibly powerful when there’s a just demand, a target who would either give in to the demand or be seen as a villain for not doing so, a wise strategy, and a group of supporters.
I don’t think these hunger strikes pass the bar. Their political demands are not what AI companies would realistically give in to because of a hunger strike by a small number of outsiders.
A hunger strike can bring attention to how seriously you perceive an issue. If you know how to make it go viral, that is; in the US, hunger strikes are rarely widely covered by the media. And even then, you are more likely to marginalize your views than to make them go more mainstream: if people don’t currently think halting frontier general AI development requires hunger strikes, a hunger strike won’t explain to them why your views are correct: this is not self-evident just from the description of the hunger strike, and so the hunger strike is not the right approach here and now.
Also, our movement does not need martyrs. You can be a lot more helpful if you eat well, sleep well, and are able to think well and hard. Your life is also very valuable, it is a part of what we’re fighting for; saving a world without you is slightly sadder than saving a world with you; and perhaps more importantly to you, it will not help. It needs to already be seen by the public as legitimate, to make them more sympathetic towards your cause and exert pressure. It needs to target decision makers who have the means to give in and advance your cause by doing that, for it to have any meaning at all.
At the moment, these hunger strikes are people vibe-protesting. They feel like some awful people are going to kill everyone, they feel powerless, and so they find a way to do something that they perceive as having a chance of changing the situation.
Please don’t risk your life; especially, please don’t risk your life in this particular way that won’t change anything.
Action is better than inaction; but please stop and think of your theory of change for more than five minutes, if you’re planning to risk your life, and then don’t risk your life[1]; please pick actions thoughtfully and wisely and not because of the vibes[2].
You can do much more if you’re alive and well and use your brain.
Not to say that you shouldn’t be allowed to risk your life for a large positive impact. I would sacrifice my life for some small chance of preventing AI risk. But most people who think they’re facing a choice to sacrifice their life for some chance of making a positive impact are wrong and don’t actually face it; so I think the bar for risking one’s life should be very high. In particular, when people have time to carefully do the math, I really want them to carefully do the math before deciding to risk their lives, and in this specific case, some of my frustration is from the people clearly getting their math wrong.
I think as a community, we also would really want to make people err on the side of safety, and have a strong norm of assumption that most people who decide to sacrifice their lives got their math wrong, especially if a community that shares their values disagrees with them on the consequences of the sacrifice. People really shouldn’t be risking their lives without having carefully thought of the theory of change (when they have the ability to do so).
I’d bet if we find people competent in how movements achieve their goals, they will say that these particular hunger strikes are not great; and I expect it to be the case most of the time when individuals who share values with a larger movement decide to go on a hunger strike even as the larger movement thinks that would not be effective.
My strong impression is that the person on the hunger strike in front of Anthropic is doing this primarily because he feels like it is the proper thing to do in this situation, like it’s the action someone should be taking here.
Hi Mikhail, thanks for offering your thoughts on this. I think having more public discussion on this is useful and I appreciate you taking the time to write this up.
I think your comment mostly applies to Guido in front of Anthropic, and not our hunger strike in front of Google DeepMind in London.
Hunger strikes can be incredibly powerful when there’s a just demand, a target who would either give in to the demand or be seen as a villain for not doing so, a wise strategy, and a group of supporters.
I don’t think these hunger strikes pass the bar. Their political demands are not what AI companies would realistically give in to because of a hunger strike by a small number of outsiders.
I don’t think I have been framing Demis Hassabis as a villain and if you think I did it would be helpful to add a source for why you believe this.
I’m asking Demis Hassabis to “publicly state that DeepMind will halt the development of frontier AI models if all the other major AI companies agree to do so.” which I think is a reasonable thing to state given all public statements he made regarding AI Safety. I think that is indeed something that a company such as Google DeepMind would give in.
A hunger strike can bring attention to how seriously you perceive an issue. If you know how to make it go viral, that is; in the US, hunger strikes are rarely widely covered by the media.
I’m currently in the UK, and I can tell you that there’s already been twopieces published on Business Insider. I’ve also given three interviews in the past 24 hours to journalists to contribute to major publications. I’ll try to add links later if / once these get published.
At the moment, these hunger strikes are people vibe-protesting. They feel like some awful people are going to kill everyone, they feel powerless, and so they find a way to do something that they perceive as having a chance of changing the situation.
Again, I’m pretty sure I haven’t framed people as “awful”, and would be great if you could provide sources to that statement. I also don’t feel powerless. My motivation for doing this was in part to provide support to Guido’s strike in front of Anthropic, which feels more like helping an ally, joining forces.
I find it actually empowering to be able to be completely honest about what I actually think DeepMind should do to help stop the AI race and receive so much support from all kinds of people on the street, including employees from Google, Google DeepMind, Meta and Sony. I am also grateful to have Denys with me, who flew from Amsterdam to join the hunger strike, and all the journalists who have taken the time to talk to us, both in person and remotely.
Action is better than inaction; but please stop and think of your theory of change for more than five minutes, if you’re planning to risk your life, and then don’t risk your life[1]; please pick actions thoughtfully and wisely and not because of the vibes[2].
I agree to the general point that taking decisions based on an actual theory of change is a much more effective way to have an impact in the world. I’ve personally thought quite a lot about why doing this hunger strike in front of DeepMind is net good, and I believe it’s having the intended impact, so I disagree with your implication that I’m basing my decisions on vibes. If you’d like to know more I’d be happy to talk to you in person in front of the DeepMind office or remotely.
Now, taking a step back and considering Guido’s strike, I want to say that even if you think that his actions were reckless and based on vibes, it’s worth evaluating whether his actions (and their consequences) will eventually turn out to be net negative. For one I don’t think I would have been out in front of DeepMind as I type this if it was not for Guido’s action, and I believe what we’re doing here in London is net good. But most importantly we’re still at the start of the strikes so it’s hard to tell what will happen as this continues. I’d be happy to have this discussion again at the end of the year, looking back.
Finally, I’d like to acknowledge the health risks involved. I’m personally looking over my health and there are some medics at King’s Cross that would be willing to help quickly if anything extreme was to happen. And given the length of the strikes so far I think what we’re doing is relatively safe, though I’m happy to be proven otherwise.
your comment mostly applies to Guido in front of Anthropic
Yep!
I don’t think I have been framing Demis Hassabis as a villain
A hunger strike is not a good tool if you don’t want to paint someone as a villain in the eyes of the public when they don’t give in to your demand.
publicly state that DeepMind will halt the development of frontier AI models if all the other major AI companies agree to do so.” which I think is a reasonable thing to state
It is vanishingly unlikely that all other major AI companies would agree to do so without the US government telling them to; this statement would be helpful, but only to communicate their position and not because of the commitment itself. Why not ask them to ask the government to stop everyone (maybe conditional on China agreeing to stop everyone in China)?
I’ve also given three interviews in the past 24 hours to journalists to contribute to major publications
If any of them go viral in the US with a good message, I’ll (somewhat) change my mind!
I disagree with your implication that I’m basing my decisions on vibes
This was mainly my impression after talking to Guido; but do you want to say more about the impact you think you’ll have?
I’d be happy to have this discussion again at the end of the year, looking back
(Can come back to it at the end of the year; if you have any advance predictions, they might be helpful to have posted!)
And given the length of the strikes so far I think what we’re doing is relatively safe, though I’m happy to be proven otherwise
I hope you remain safe and are not proven otherwise! Hunger strikes do carry negative risks though. Do you have particular plans for how long to be on the hunger strike for?
A hunger strike is not a good tool if you don’t want to paint someone as a villain in the eyes of the public when they don’t give in to your demand.
Is there any form of protest that doesn’t implicitly imply that the person you’re protesting is doing something wrong? When the thing wrong is “causing human extinction” it seems to me kind of hard for that to not automatically be assumed ‘villainous’.
(Asking genuinely, I think it quite probably the answer is ‘yes’.)
Something like: Hunger strikes are optimized hard specifically for painting someone as a villain because they decide to make someone suffer or die (or be inhumanely fed), this is different from other forms of protests that are more focused on, e.g., that specific decisions are bad and should be revoked, but don’t necessarily try to make people perceive the other side as evil.
I don’t really see the problem with painting people as evil in principle, given that some people are evil. You can argue against it in specific cases, but I think the case for AI CEOs being evil is strong enough that it can’t be dismissed out of hand.
The case in question is “AI CEOs are optimising for their short-term status/profits, and for believing things about the world which maximise their comfort, rather than doing the due diligence required of someone in their position, which is to seriously check whether their company is building something which kills everyone”
Whether this is a useful frame for one’s own thinking—or a good frame to deploy onto the public—I’m not fully sure, but I think it does need addressing. Of course it might also differ between CEOs. I think Demis and Dario are two of the CEOs who it’s relatively less likely to apply to, but also I don’t think it applies weakly enough for them to be dismissed out of hand even in their cases.
“People are on hunger strikes” is not really a lot of evidence for “AI CEOs are optimizing for their short-term status/profits and are not doing the due diligence” in the eyes of the public.
I don’t think there’s any problem with painting people and institutions as evil, I’m just not sure why you would want to do this here, as compared to other things, and would want people to have answers to how they imagine a hunger strike would paint AI companies/CEOs and what would be the impact of that, because I expect little that could move the needle.
It is vanishingly unlikely that all other major AI companies would agree to do so without the US government telling them to; this statement would be helpful, but only to communicate their position and not because of the commitment itself. Why not ask them to ask the government to stop everyone (maybe conditional on China agreeing to stop everyone in China)?
This seems to be exactly the point of the demand? This is a demand that would be cheap (perhaps even of negative cost) for DeepMind to accept (because the other AI companies wouldn’t agree to that), and would also be a major publicity win for the Pause AI crowd. Even counting myself skeptical of the hunger strikes, I think this is a very smart move.
the demand is that a specific company agrees to halt if everyone halts; this does not help in reality, because in fact it won’t be the case that everyone halts (abscent gov intervention).
Action is better than inaction; but please stop and think of your theory of change for more than five minutes,
I think there’s a very reasonable theory of change—X-risk from AI needs to enter the Overton window. I see no justification here for going to the meta-level and claiming they did not think for 5 minutes, which is why I have weak downvoted in addition to strong disagree.
This tactic might not work, but I am not persuaded by your supposed downsides. The strikers should not risk their lives, but I don’t get the impression that they are. The movement does need people who are eating → working on AI safety research, governance, and other forms of advocacy. But why not this too? Seems very plausibly a comparative advantage for some concerned people, and particularly high leverage when very few are taking this step. If you think they should be doing something else instead, say specifically what it is and why these particular individuals are better suited to that particular task.
I see no justification here for going to the meta-level and claiming they did not think for 5 minutes
Michaël Trazzi’s comment, which he wrote a few hours before he started his hunger strike, isn’t directly about hunger striking but it does indicate to me that he put more than 5 minutes of thought into the decision, and his comment gestures at a theory of change.
I spoke to Michaël in person before he started. I told him I didn’t think the game theory worked out (if he’s not willing to die, GDM should ignore him; if he does die then he’s worsening the world, since he can definitely contribute better by being alive, and GDM should still ignore him). I don’t think he’s going to starve himself to death or serious harm, but that does make the threat empty. I don’t really think that matters too much on a game-theoretic-reputation method since nobody seems to be expecting him to do that.
His theory of change was basically “If I do this, other people might” which seems to be true: he did get another person involved. That other person has said they’ll do it for “1-3 weeks” which I would say is unambiguously not a threat to starve oneself to death.
As a publicity stunt it has kinda worked in the basic sense of getting publicity. I think it might change the texture and vibe of the AI protest movement in a direction I would prefer it to not go in. It certainly moves the salience-weighted average of public AI advocacy towards Stop AI-ish things.
As Mikhail said, I feel great empathy and respect for these people. My first instinct was similar to yours, though - if you’re not willing to die, it won’t work, and you probably shouldn’t be willing to die (because that also won’t work / there are more reliable ways to contribute / timelines uncertainty).
I think ‘I’m doing this to get others to join in’ is a pretty weak response to this rebuttal. If they’re also not willing to die, then it still won’t work, and if they are, you’ve wrangled them in at more risk than you’re willing to take on yourself, which is pretty bad (and again, it probably still won’t work even if a dozen people are willing to die on the steps of the DeepMind office, because the government will intervene, or they’ll be painted as loons, or the attention will never materialize and their ardor will wain).
I’m pretty confused about how, under any reasonable analysis, this could come out looking positive EV. Most of these extreme forms of protest just don’t work in America (e.g. the soldier who self-immolated a few years ago). And if it’s not intended to be extreme, they’ve (I presume accidentally) misbranded their actions.
Fair enough. I think these actions are +ev under a coarse grained model where some version of “Attention on AI risk” is the main currency (or a slight refinement to “Not-totally-hostile attention on AI risk”). For a domain like public opinion and comms, I think that deploying a set of simple heuristics like “Am I getting attention?” “Is that attention generally positive?” “Am I lying or doing something illegal?” can be pretty useful.
Michael said on twitter here that he’s had conversations with two sympathetic DeepMind employees, plus David Silver, who was also vaguely sympathetic. This itself is more +ev than I expected already, so I’m updating in favour of Michael here.
It’s also occurred to me that if any of the CEOs cracks and at least publicly responds the hunger strikers, then the CEOs who don’t do so will look villainous, so you actually only need to have one of them respond to get a wedge in.
“Attention on AI risk” is a somewhat very bad proxy to optimize for, where available tactics include attention that would be paid to luddites, lunatics, and crackpots caring about some issue.
The actions that we can take can:
Use what separates us from people everyone considers crazy: that our arguments check out and our predictions hold; communicate those;
Spark and mobilize existing public support;
Be designed to optimize for positive attention, not for any attention.
I don’t think DeepMind employees really changed their minds? Like, there are people at DeepMind with p(doom) higher than Eliezer’s; they would be sympathetic; would they change anything they’re doing? (I can imagine it prompting them to talk to others at DeepMind, talking about the hunger strike to validate the reasons for it.)
I don’t think Demis responding to the strike would make Dario look particularly villainous, happy to make conditional bets. How villainous someone looks here should be pretty independent, outside of eg Demis responding, prompting a journalist to ask Dario, which takes plausible deniability away from him.
I’m also not sure how effective it would be to use this to paint the companies (or the CEOs—are they even the explicit targets of the hunger strikes?) as villainous.
To clarify, “think for five minutes” was an appeal to people who might want to do these kinds of things in the future, not a claim about Guido or Michael.
That said, I do in fact claim they have not thought carefully about their theory of change, and the linked comment from Michael lists very obvious surface-level reasons for why do this in front of anthropic and not openai; I really would not consider this on the level of demonstrating having thought carefully about the theory of change.
I think there’s a very reasonable theory of change—X-risk from AI needs to enter the Overton window
While in principle, as I mentioned, a hunger strike can bring attention, this is not an effective way to do this for the particular issue that AI will kill everyone by default. The diff to communicate isn’t “someone is really scared of AI ending the world”; it’s “scientists think AI might literally kill everyone and also here are the reasons why”.
claiming they did not think for 5 minutes
This was not a claim about these people but an appeal to potential future people to maybe do research on this stuff before making decisions like this one.
That said, I talked to Guido prior to the start of the hunger strike, tried to understand his logic, and was not convinced he had any kind of reasonable theory of change guiding his actions, and my understanding is that he perceives it as the proper action to take, in a situation like that, which is why I called this vibe-protesting.
I don’t get the impression that they are
(It’s not very clear what would be the conditions for them to stop the hunger strikes.)
But why not this too?
Hunger strikes can be very effective and powerful if executed wisely. My comment expresses my strong opinion that this did not happen here, not that it can’t happen in general.
At the moment, these hunger strikes are people vibe-protesting.
I think I somewhat agree, but also I think this is a more accurate vibe than “yay tech progress”. It seems like a step in the right direction to me.
Please don’t risk your life; especially, please don’t risk your life in this particular way that won’t change anything.
Action is better than inaction; but please stop and think of your theory of change for more than five minutes, if you’re planning to risk your life, and then don’t risk your life; please pick actions thoughtfully and wisely and not because of the vibes.
You repeat a recommendation not to risk your life. Um, I’m willing to die to prevent human extinction. The math is trivial. I’m willing to die to reduce the risk by a pretty small percentage. I don’t think a single life here is particularly valuable on consequentialist terms.
There’s important deontology about not unilaterally risking other people’s lives, but this mostly goes away in the case of risking your own life. This is why there are many medical ethics guidelines that separate self-experimentation as a special case from rules for experimenting on others (and that’s been used very well in many cases and aligns incentives). I think one should have dignity and respect yourself, but I think there are many self-respecting situations where one should take major personal sacrifices and risk one’s whole life. (Somewhat similarly there are many situations to risk being prosecuted unjustly by the state and spending a great deal of your life in-prison.)
There’s important deontology about not unilaterally risking other people’s lives, but this mostly goes away in the case of risking your own life.
I don’t think so, I agree we shouldn’t have laws around this, but insofar as we have deontologies to correct for circumstances where historically our naive utility maximizing calculations have been consistently biased, I think there have been enough cases of people uselessly martyring themselves for their causes to justify a deontological rule not to sacrifice your own actual life.
Edit: Basically, I don’t want suicidal people to back-justify batshit insane reasons why they should die to decrease x-risk instead of getting help. And I expect these are the only people who would actually be at risk for a plan which ends with “and then I die, and there is 1% increased probability everyone else gets the good ending”.
At the time, South Vietnam was led by President Ngo Dinh Diem, a devout Catholic who had taken power in 1955, and then instigated oppressive actions against the Buddhist majority population of South Vietnam. This began with measures like filling civil service and army posts with Catholics, and giving them preferential treatment on loans, land distribution, and taxes. Over time, Diem escalated his measures, and in 1963 he banned flying the Buddhist flag during Vesak, the festival in honour of the Buddha’s birthday. On May 8, during Vesak celebrations, government forces opened fire on unarmed Buddhists who were protesting the ban, killing nine people, including two children, and injured many more.
[...]
Unfortunately, standard measures for negotiation – petitions, street fasting, protests, and demands for concessions – were ignored by the Diem government, or met with force, as in the Vesak shooting.
[...]
Since conventional measures were failing, the Inter-Sect Committee decided to consider more extreme measures, including the idea of a voluntary self-immolation. While extreme, they hoped it would create an international media incident, to draw attention to the suffering of Buddhists in South Vietnam. They noted in their meeting minutes the power of photographs to focus international attention: “one body can reach where ten thousand leaflets cannot.” It was to be a Bodhisattva deed to help awaken the world.
[...]
On June 10, the Inter-Sect Committee contacted at least four Saigon-based members of the international media, telling them to be present for a “major event” that would occur the next morning. One of them was a photographer from the Associated Press, Malcolm Browne, who said he had “no idea” what he’d see, beyond expecting some kind of protest. When Thich Quang Duc and his attendants exited the car, Browne was 15 meters away, just outside the ring of chanting monks. Browne took more than 100 photos, fighting off nausea from the smell of burning gasoline and human flesh, and struggling with the horror, as he created a permanent visual record of Thich Quang Duc’s sacrifice.
The sacrifice was not in vain. The next day, Browne’s photos made the front page of newspapers around the world. They shocked people everywhere, and galvanized mass protests in South Vietnam. US President John F. Kennedy reportedly exclaimed “Jesus Christ!” upon first seeing the photo. The US government, which had been instrumental in installing and supporting the anti-communist Diem, withdrew its support, and just a few months later supported a coup that led to Diem’s death, a change in government, and the end of anti-Buddhist policy2.
Nielsen also includes unsuccessful or actively repugnant examples of it.
The sociologist Michael Biggs6 has identified more than 500 self-immolations as protest in the four decades after Thich Quang Duc, most or all of which appear to have been inspired in part by Thich Quang Duc.
I’ve discussed Thich Quang Duc’s sacrifice in tacitly positive terms. But I don’t want to uncritically venerate this kind of sacrifice. As with Kravinsky’s kidney donation, while it had admirable qualities, it also had many downsides, and the value may be contested. Among the 500 self-immolations identified by Biggs, many seem pointless, even evil. For example: more than 200 people in India self-immolated in protest over government plans to reserve university places for lower castes. This doesn’t seem like self-sacrifice in service of a greater good. Rather, it seems likely many of these people lacked meaning in their own lives, and confused the grand gesture of the sacrifice for true meaning. Moral invention is often difficult to judge, in part because it hinges on redefining our relationship to the rest of the universe.
I also think this paragraph about Quang Duc is quite relevant:
Quang Duc was not depressed nor suicidal. He was active in his community, and well respected. Another monk, Thich Nhat Hanh, who had lived with him for the prior year, wrote that Thich Quang Duc was “a very kind and lucid person… calm and in full possession of his mental faculties when he burned himself.” Nor was he isolated and acting alone or impulsively. As we’ll see, the decision was one he made carefully, with the blessing of and as part of his community.
I’m not certain if there’s a particular point you want me to take away from this, but thanks for the information, and including an unbiased sample from the article you linked. I don’t think I changed my mind so much from reading this though.
Do you also believe there is a deontological rule against suicide? I have heard rumor that most people who attempt suicide and fail, regret it. At the same time, I think some lives are worse than death (for example, see Amanda Luce’s Book Review: Two Arms And A Head that won the ACX book review prize), and so I believe it should be legal and sometimes supported, even if it were the case that most attempted suicides have been regretted.
I have heard rumor that most people who attempt suicide and fail, regret it.
After doing some research on this, I think this is unlikely to be true. The only quantitative study I found says that among its sample of suicide attempt survivors, 35.6% are glad to have survived, while 42.7% feel ambivalent, and 21.6% regret having survived. I also found a couple of sources agreeing with your “rumor”, but one cited just a suicide awareness trainer as its source, while the other cited the above study as the only evidence for its claim, somehow interpreting it as “Previous research has found that more than half of suicidal attempters regret their suicidal actions.” (Gemini 2.5 Pro says “It appears the authors of the 2023 paper misinterpreted or misremembered the findings of the 2005 study they cited.”)
If this “rumor” was true, I would expect to see a lot of studies supporting it, because such studies are easy to do and the result would be highly useful for people trying to prevent suicides (i.e., they can use it to convince potential suicide attempters that they’re likely to regret it). Evidence to the contrary are likely to be suppressed or not gathered in the first place, as almost nobody wants to encourage suicides. (The above study gathered the data incidentally, for a different purpose.) So everything seems consistent with the “rumor” being false.
Interesting, thanks. I think I had heard the rumor before and believed it.
In the linked study, it looks like they asked the people about regret very shortly after the suicide attempt. This could both bias the results towards less regret to have survived (little time to change their mind) or more regret to have survived (people might be scared to signal intent to retry suicide, for fear of being committed, which I think sometimes happens soon after failed attempts).
I think very very many people are not making an informed decision when they decide to commit suicide.
For example, I think quantum immortality is quite plausibly a thing. Very few people know about quantum immortality and even fewer have seriously thought about it. This means that almost everyone on the planet might have a very mistaken model of what suicide actually does to their anticipated experience.[1] Also, many people are religious and believe in a pleasant afterlife. Many people considering suicide are mentally ill in a way that compromises their decision making. Many people think transhumanism is impossible and won’t arrange for their brain to be frozen for that reason.
I agree that there is some threshold on the fraction of ill-considered suicides relative to total suicides such that suicide should be legal if we were below that threshold. I used to think we were maybe below that threshold. After I began studying physics at uni and so started taking quantum immortality more seriously, I switched to thinking we are maybe above the threshold.
You might find yourself in a branch where your suicide attempt failed, but a lot of your body and mind were still destroyed. If you keep exponentially decreasing the amplitude of your anticipated future experience in the universal wave function further, you might eventually find that it is now dominated by contributions from weird places and branches far-off in spacetime or configuration space that were formerly negligible, like aliens simulating you for some negotiation or other purpose.
I don’t really know yet how to reason well about what exactly the most likely observed outcome would be here. I do expect that by default, without understanding and careful engineering our civilisation doesn’t remotely have the capability for yet, it’d tend to be very Not Good.
This all feels galaxy-brained to me and like it proves too much. By analogy I feel like if you thought about population ethics for a while and came to counterintuitive conclusions, you might argue that people who haven’t done that shouldn’t be allowed to have children; or if they haven’t thought about timeless decision theory for a while they aren’t allowed to get a carry license.
I don’t think it proves too much. Informed decision-making comes in degrees, and some domains are just harder? Like, I think my threshold for leaving people free to make their own mistakes if they are the only ones harmed by them is very low, compared to where the human population average seems to be at the moment. But my threshold is, in fact, greater than zero.
For example, there are a bunch of things I think bystanders should generally prevent four year old human children from doing, even if the children insist that they want to do them. I know that stopping four year old children from doing these things will be detrimental in some cases, and that having such policies is degrading to the childrens’ agency. I remember what it was like being four years old and feeling miserable because of kindergarten teachers who controlled my day and thought they knew what was best for me. I still think the tradeoff is worth it on net in some cases.
I just think that the suicide thing happens to be a case where doing informed decision-making is maybe just too tough for way too many humans and thus some form of ban could plausibly be worth it on net. Sports betting is another case where I was eventually convinced that maybe a legal ban of some form could be worth it.
(I agree with Lucious in that I think it is important that people have the option of getting cryopreserved and also are aware of all the reality-fluid stuff before they decide to kill themselves.)
“Important” is ambiguous, in that I agree it matters, but it does for this civilization to ban whole life options from people until they have heard about niche philosophy. Most people will never hear about niche philosophy.
I don’t think quantum immortality changes anything. You can rephrame this in terms of standard probability theory and condition on them continuing to have subjective experience, and still get to the same calculus.
However, only considering the branches in which you survive, or conditioning on having subjective experience after the suicide attempt, ignores the counterfactual suffering prevented in all the branches (or probability mass) in which you did die, which may be less unpleasant than the branches in which you survived, but are many many more in number! Ignoring those branches biases the reasoning toward rare survival tails that don’t dominate the actual expected utility.
I don’t think quantum immortality changes anything. You can rephrame this in terms of standard probability theory and condition on them continuing to have subjective experience, and still get to the same calculus.
I agree that quantum mechanics is not really central for this on a philosophical level. You get a pretty similar dynamic just from having a universe that is large enough to contain many almost-identical copies of you. It’s just that it seems at present very unclear and arguable whether the physical universe is in fact anywhere near that large, whereas I would claim that a universal wavefunction which constantly decoheres into different branches containing different versions of us is pretty strongly implied to be a thing by the laws of physics as we currently understand them.
However, only considering the branches in which you survive, or conditioning on having subjective experience after the suicide attempt, ignores the counterfactual suffering prevented in all the branches (or probability mass) in which you did die, which may be less unpleasant than the branches in which you survived, but are many many more in number! Ignoring those branches biases the reasoning toward rare survival tails that don’t dominate the actual expected utility.
It is very late here and I should really sleep instead of discussing this, so I won’t be able to reply as in-depth as this probably merits. But, basically, I would claim that this is not the right way to do expected utility calculations when it comes to ensembles of identical or almost-identical minds.
A series of thought experiments might maybe help illustrate part of where my position comes from:
Imagine someone tells you that they will put you to sleep and then make two copies of you, identical down to the molecular level. They will place you in a room with blue walls. They will place one copy of you in a room with red walls, and the other copy in another room with blue walls. Then they will wake all three of you up.
What color do you anticipate seeing after you wake up, and with what probability?
I’d say 2⁄3 blue, 1⁄3 red. Because there will now be three versions of me, and until I look at the walls I won’t know which one I am.
Imagine someone tells you that they will put you to sleep and then make two copies of you. One copy will not include a brain. It’s just a dead body with an empty skull. Another copy will be identical to you down to the molecular level. Then they will place you in a room with blue walls, and the living copy in a room with red walls. Then they will wake you and the living copy up.
What color do you anticipate seeing after you wake up, and with what probability? Is there a 1⁄3 probability that you ‘die’ and don’t experience waking up because you might end up ‘being’ the corpse-copy?
I’d say 1⁄2 blue, 1⁄2 red, and there is clearly no probability of me ‘dying’ and not experiencing waking up. It’s just a bunch of biomass that happens to be shaped like me.
As 2, but instead of creating the corpse-copy without a brain, it is created fully intact, then its brain is destroyed while it is still unconscious. Should that change our anticipated experience? Do we now have a 1⁄3 chance of dying in the sense that we might not experience waking up? Is there some other relevant sense in which we die, even if it does not affect our anticipated experience?
I’d say no and no. This scenario is identical to 2 in terms of the relevant information processing that is actually occurring. The corpse-copy will have a brain, but it will never get to use it, so it won’t affect my expected anticipated experience in any way. Adding more dead copies doesn’t change my anticipated experience either. My best scoring prediction will be that I have 1⁄2 chance of waking up to see red walls, and 1⁄2 chance of waking up to see blue walls.
In real life, if you die in the vast majority of branches caused by some event, i.e. that’s where the majority of the amplitude is, but you survive in some, the calculation for your anticipated experience would seem to not include the branches where you die for the same reason it doesn’t include the dead copies in thought experiments 2 and 3.
(I think Eliezer may have written about this somewhere as well using pretty similar arguments, maybe in the quantum physics sequence, but I can’t find it right now.)
You get a pretty similar dynamic just from having a universe that is large enough to contain many almost-identical copies of you.
Again, not sure why a large universe is needed. The expected utility ends up the same either way, whether you have some fraction of branches in which you remain alive or some probability of remaining alive.
Regarding the expected utility calculus. I agree with everything you said but i don’t see how any of it allows you to disregard the counterfactual suffering from not committing suicide in your expected value calculation.
Maybe the crux is whether we consider the utility of each “you” (i.e. you in each branch) individually, and add it up for the total utility, or wether we consider all “you”s to have just one shared utility.
Let’s say that not committing suicide gives you −1 utility in n branches but commiting suicide gives you −100 utility in n/m branches and 0 utility in n−n/m branches
If we treat all copies of you as having separate utilities and add them all up for a total expected utility calculation, not committing suicide gives −n utility while committing suicide leads to −100n/m utility. Therefore, as long as m>100, it is better to commit suicide.
If, on the other hand you treat them as having one shared utility, you get either −1 or −100 utility, and −100 is of course worse.
Do you agree that this is the crux? If so, why do you think that all the copies share one utility rather than their utilities adding up?
In a large universe, you do not end. Like, not in expectation see some branch versus other; you just continue, the computation that is you continues. When you open your eyes, you’re not likely to find yourself as a person in a branch computed only relatively rarely; still, that person continues, and does not die.
Attemted suicide reduces your reality-fluid- how much you’re computed and how likely you are to find yourself there- but you will continue to experience the world. If you die in a nuclear explosion, the continuation of you will be somewhere else, sort-of isekaied; and mostly you will find yourself not in a strange world that recovers the dead but in a world where the nuclear explosion did not appear; still, in a large world, even after a nuclear explosion, you continue.
You might care about having a lot of reality-fluid, because this makes your actions more impactful, because you can spend your lightcone better, and improve the average experience in the large universe. You might also assign negative utility to others seeing you die; they’ll have a lot of reality-fluid in worlds where you’re dead and they can’t talk to you, even as you continue. But I don’t think it works out to assigning the same negative utility to dying as in branches of small worlds.
Yes, but the number of copies of you still reduces (or the probability that you are alive in standard probability theory, or the number of branches in many worlds). Why are these not equivalent in terms of the expected utility calculus?
Imagine they you’re an agent in the game of life. Your world, your laws of physics are computed on a very large independent computers; all performing the same computation.
You exist within the laws of causality of your world, computed as long as at least one server computes your world. If some of them stop performing the computation, it won’t be a death of a copy; you’ll just have one fewer instance of yourself.
You are of course right that there’s no difference between reality-fluid and normal probabilities in a small world: it’s just how much you care about various branches relative to each other, regardless of whether all of them will exist or only some.
I claim that the negative utility due to stopping to exist is just not there, because you don’t actually stop to exist in a way you reflectively care about, when you have fewer instances. For normal things (e.g., how much do you care about paperclips), the expected utility is the same; but here, it’s the kind of terminal value that i expect for most people would be different; guaranteed continuation in 5% of instances is much better than 5% chance of continuing in all instances; in the first case, you don’t die!
I claim that the negative utility due to stopping to exist is just not there
But we are not talking about negative utility due to stopping to exist. We are talking about avoiding counterfactual negative utility by committing suicide, which still exists!
guaranteed continuation in 5% of instances is much better than 5% chance of continuing in all instances; in the first case, you don’t die!
I think this is an artifact of thinking of all of the copies having a shared utility (i.e. you) rather than separate utilities that add up (i.e. so many yous will suffer if you don’t commit suicide). If they have separate utilities, we should think of them as separate instances of yourself.
it’s the kind of terminal value that i expect for most people would be different; guaranteed continuation in 5% of instances is much better than 5% chance of continuing in all instances; in the first case, you don’t die!
And even in the case where we are assigning negative utility to death, most people are really considering counterfactual utility from being alive, and 95% of that (expected) counterfactual utility is lost whether 95% of the “instances of you” die or whether there is a 95% chance that “you” die.
I think there is, and I think cultural mores well support this. Separately, I think we shouldn’t legislate morality and though suicide is bad, it should be legal[1].
At the same time, I think some lives are worse than death (for example, see Amanda Luce’s Book Review: Two Arms And A Head that won the ACX book review prize), and so I believe it should be legal and sometimes supported
There also exist cases where it is in fact correct from a utilitarian perspective to kill, but this doesn’t mean there is no deontological rule against killing. We can argue about the specific circumstances where we need these rule carve-outs (eg war), but I think we’d agree that when it comes to politics and policy, there ought to be no carve-outs, since people are particularly bad at risk-return calculations in that domain.
But also this would mean we have to deal with certain liability issues, eg if ChatGPT convinces a kid to kill themselves, we’d like to say this is manslaughter or homicide iff the kid otherwise would’ve gotten better, but how do we determine that? I don’t know, and probably on net we should choose freedom instead, or this isn’t actually much a problem in practice.
Makes sense. I don’t hold this stance; I think my stance is that many/most people are kind of insane on this, but that like with many topics we can just be more sane if we try hard and if some of us set up good institutions around it for helping people have wisdom to lean on in thinking about it, rather than having to do all their thinking themselves with their raw brain.
(I weakly propose we leave it here, as I don’t think I have a ton more to say on this subject right now.)
At the moment, these hunger strikes are people vibe-protesting.
To clarify, I meant that the choice of actions was based on the vibes, not on careful consideration, this seeming like the right thing to do in these circuimstances.
You repeat a recommendation not to risk your life
I maybe formulated this badly.
I do not disagree with that part of your comment. I did, in fact, risk being prosecuted unjustly by the state and spending a great deal of my life in prison. I was also aware of the kinds of situations I’d want to go for hunger strikes in while in prison, though didn’t think about that often.
And I, too, am willing to die to reduce the risk by a pretty small chance.
Most of the time, though, I think people who think they have this choice don’t actually face it; I think the bar for risking one’s life should be very high. In particular, when people have time to carefully do the math, I really want them to carefully do the math before deciding to risk their lives, and in this particular case, some of my frustration is from the people getting their math wrong.
I think as a community, we also would really want to make people err on the side of safety, and have a strong norm of assumption that most people who decide to sacrifice their lives got their math wrong. People really shouldn’t be risking their lives without having carefully thought of the theory of change when they have the ability to do so.
Like, I’d bet if we find people competent in how movements achieve their goals, they will say that these particular hunger strikes are not great; and I expect it to be the case most of the time when individuals who share values with a larger movement decide to go on a hunger strike even as the larger movement thinks that would not be effective.
I think I somewhat agree that these hunger strikes will not shut down the companies or cause major public outcry.
I think that there is definitely something to be said that potentially our society is very poor at doing real protesting, and will just do haphazard things and never do anything goal-directed. That’s potentially a pretty fundamental problem.
But setting that aside (which is a big thing to set aside!) I think the hunger-strike is moving in the direction of taking this seriously. My guess is most projects in the world don’t quite work, but they’re often good steps to help people figure out what does work. Like, I hope this readies people to notice opportunities for hunger strikes, and also readies them to expect people to be willing to make large sacrifices on this issue.
People do in fact try to be very goal-directed about protesting! They have a lot of institutional knowledge on it!
You can study what worked and what didn’t work in the past, and what makes a difference between a movement that succeeds and a movement that doesn’t. You can see how movements organize, how they grow local leaders, how they come up with ideas that would mobilize people.
A group doesn’t have to attempt a hunger strike to figure out what the consequences would be; it can study and think, and I expect that to be a much more valuable use of time than doing hunger strikes.
I’d be interested to read a quick post from you that argued “Hunger-strikes are not the right tool for this situation; here is what they work for and what they don’t work for. Here is my model of this situation and the kind of protests that do make sense.”
I don’t know much about protesting. Most of the recent ones that get big enough that I hear about them have been essentially ineffectual as far as I can recall (Occupy Wallstreet, Women’s March, No Kings). I am genuinely interested in reading about effective and clearly effective protests led by anyone currently doing protests, or within the last 10 years. Even if on a small scale.
(My thinking isn’t that protests have not worked in the past – I believe they have, MLK, Malcolm X, Women’s Suffrage Movement, Vietnam War Protest, surely more – but that the current protesting culture has lost its way and is no longer effective.)
I am genuinely interested in reading about effective and clearly effective protests led by anyone currently doing protests, or within the last 10 years. Even if on a small scale.
“Protest movements could be more effective than the best charities”—SSIR
About two weeks ago, I published an article in Stanford Social Innovation Review (SSIR), a magazine for those interested in philanthropy, social science and non-profits. … Although my article is reasonably brief (and I obviously recommend reading it in full!) here’s a quick summary of what I spoke about, plus some nuances I forgot or wasn’t able to add:
...
There is a reasonable amount of evidence that shows that protest movement can have significant impacts, across a variety of outcomes from policy, public opinion, public discourse, voting behaviour, and corporate behaviour. I’ll leave this point to be explained in greater detail in our summary of our literature review on protest outcomes!
...
3. A summary of Social Change Lab’s literature reviews, who we are, and our next steps
We’ve recently conducted two literature reviews, looking over 60+ academic studies across political science, sociology and economics, to tackle some key questions around protest movements. Specifically, we had two main questions:
What are the outcomes of protest and protest movements? - Literature Review
What factors make some protest movements more likely to succeed relative to others? - Literature Review
(Would be interested in someone going through this paper and writing a post or comment highlighting some examples and why they’re considered successful.)
Not quite responding to your main point here, but I’ll say that this position would seem valid to me and good to say if you believed it.
Some people make major self-sacrifices wisely, and others for poor reasons and due to misaligned social pressures. I feel that this is an example of the latter, so I do not endorse it, I think people who care about this issue should not endorse it, and I hope someone helps them and they stop.
I don’t know what personal life tradeoffs any of them are making, so I have a hard time speaking to that. I just found out that Michael Trazzi is one of the people doing a hunger strike; I don’t think it’s true of him that he hasn’t thought seriously about the issues given how he’s been intellectually engaged for 5+ years.
(Social movements (and comms and politics) are not easy to reason about well from first principles. I think Michael is wrong to be making this particular self-sacrifice, not because he hasn’t thought carefully about AI but because he hasn’t thought carefully about hunger strikes.)
Relevantly, if any of them actually die, and if also it does not cause major change and outcry, I will probably think they made a foolish choice (where ‘foolish’ means ‘should have known in advance this was the wrong call on a majorly important decision’).
My modal guess is that they will all make real sacrifice, and stick it out for 10-20 days, then wrap up.
On the object level it’s (also) important to emphasize that these guys don’t seem to be seriously risking their lives. At least one of them noted he’s taking vitamins, hydrating etc. On consequentialist grounds I consider this to be an overdetermined positive.
a hunger strike will eventually kill you even if you take vitamins, electrolytes, and sugar. (a way to prevent death despite the target not giving in is often a group of supporters publicly begging the person on the hunger strike to stop and not kill themselves for some plausible reasons, but sometimes people ignore that and die.) I’m not entirely sure what Guido’s intention is if Anthropic doesn’t give in.
Sure, I just want to defend that it would also be reasonable if they were doing a more intense and targeted protest. “Here is a specific policy you must change” and “I will literally sacrifice my life if you don’t make this change”. So I’m talking about the stronger principle.
I don’t strongly agree or disagree with your empirical claims but I do disagree with the level of confidence expressed. Quoting a comment I made previously:
I’m undecided on whether things like hunger strikes are useful but I just want to comment to say that I think a lot of people are way too quick to conclude that they’re not useful. I don’t think we have strong (or even moderate) reason to believe that they’re not useful.
When I reviewed the evidence on large-scale nonviolent protests, I concluded that they’re probably effective (~90% credence). But I’ve seen a lot of people claim that those sorts of protests are ineffective (or even harmful) in spite of the evidence in their favor.[1] I think hunger strikes are sufficiently different from the sorts of protests I reviewed that the evidence might not generalize, so I’m very uncertain about the effectiveness of hunger strikes. But what does generalize, I think, is that many peoples’ intuitions on protest effectiveness are miscalibrated.
[1] This may be less relevant for you, Mikhail Samin, because IIRC you’ve previously been supportive of AI pause protests in at least some contexts.
ETA: To be clear, I’m responding to the part of your post that’s about whether hunger strikes are effective. I endorse positive message of the second half of your post.
ETA 2: I read Ben Pace’s comment and he is making some good points so now I’m not sure I endorse the second half.
To be very clear, I expect large social movements that use protests as one of its forms of action to have the potential to be very successful and impactful if done well. Hunger strikes are significantly different from protests. Hunger strikes can be powerful, but they’re best for very different contexts.
Aside from whether or not the hunger strikes are a good idea, I’m really glad they have emphasized conditional commitments in their demands
I think that we should be pushing on these much much more: getting groups to say “I’ll do X if abc groups do X as well”
And should be pushing companies/governments to be clear whether their objection is “X policy is net-harmful regardless of whether anyone else does it” vs “X is net-harmful for us if we’re the only ones to do it”
[I recognize that some of this pushing/clarification might make sense privately, and that groups will be reluctant to stay stuff like this publicly because of posturing and whatnot.]
(While I like it being directed towards coordination, it would not actually make a difference, as it won’t be the case that all AI companies want to stop, and so it would still not be of great significance. The thing that works is a gov-supported ban on developing ASI anywhere in the world. A commitment to stop if everyone else stops doesn’t actually come into force unless everyone is required to stop anyway.
An ask that works is, e.g., “tell the government they need to stop everyone, including us”.)
I think we should show some solidarity to people committed to their beliefs and making a personal sacrifice, rather than undermining them by critiquing their approach.
Given that they’re both young men and it is occurring in a first world country, it seems unlikely anyone will die. But it does seem likely they or their friends will read this thread.
Beyond that, the hunger strike is only on day 2 and is has already received a small amount of media coverage. Should they go viral then this one action alone will have a larger differential impact on reducing existential risk than most safety researchers will achieve in their entire careers.
This is surprising to hear on LessWrong, where we value truth without having to think of object-level reasons for why it is good to say true things. But on the object level: it would be very dangerous for a community to avoid saying true things because it is afraid of undermining someone’s sacrifice; this would lead to a lot of needless, and even net-negative, sacrifice, without mechanisms for self-correction. Like, if I ever do something stupid, please tell me (and everyone) that instead of respecting my sacrifice: I would not want others to repeat my mistakes.
(There are lots of ways to get media coverage and it’s not always good in expectation. If they go viral, in a good way/with a good message, I will somewhat change my mind.)
I have great empathy and deep respect for the courage of the people currently on hunger strikes to stop the AI race. Yet, I wish they hadn’t started them: these hunger strikes will not work.
Hunger strikes can be incredibly powerful when there’s a just demand, a target who would either give in to the demand or be seen as a villain for not doing so, a wise strategy, and a group of supporters.
I don’t think these hunger strikes pass the bar. Their political demands are not what AI companies would realistically give in to because of a hunger strike by a small number of outsiders.
A hunger strike can bring attention to how seriously you perceive an issue. If you know how to make it go viral, that is; in the US, hunger strikes are rarely widely covered by the media. And even then, you are more likely to marginalize your views than to make them go more mainstream: if people don’t currently think halting frontier general AI development requires hunger strikes, a hunger strike won’t explain to them why your views are correct: this is not self-evident just from the description of the hunger strike, and so the hunger strike is not the right approach here and now.
Also, our movement does not need martyrs. You can be a lot more helpful if you eat well, sleep well, and are able to think well and hard. Your life is also very valuable, it is a part of what we’re fighting for; saving a world without you is slightly sadder than saving a world with you; and perhaps more importantly to you, it will not help. It needs to already be seen by the public as legitimate, to make them more sympathetic towards your cause and exert pressure. It needs to target decision makers who have the means to give in and advance your cause by doing that, for it to have any meaning at all.
At the moment, these hunger strikes are people vibe-protesting. They feel like some awful people are going to kill everyone, they feel powerless, and so they find a way to do something that they perceive as having a chance of changing the situation.
Please don’t risk your life; especially, please don’t risk your life in this particular way that won’t change anything.
Action is better than inaction; but please stop and think of your theory of change for more than five minutes, if you’re planning to risk your life, and then don’t risk your life[1]; please pick actions thoughtfully and wisely and not because of the vibes[2].
You can do much more if you’re alive and well and use your brain.
Not to say that you shouldn’t be allowed to risk your life for a large positive impact. I would sacrifice my life for some small chance of preventing AI risk. But most people who think they’re facing a choice to sacrifice their life for some chance of making a positive impact are wrong and don’t actually face it; so I think the bar for risking one’s life should be very high. In particular, when people have time to carefully do the math, I really want them to carefully do the math before deciding to risk their lives, and in this specific case, some of my frustration is from the people clearly getting their math wrong.
I think as a community, we also would really want to make people err on the side of safety, and have a strong norm of assumption that most people who decide to sacrifice their lives got their math wrong, especially if a community that shares their values disagrees with them on the consequences of the sacrifice. People really shouldn’t be risking their lives without having carefully thought of the theory of change (when they have the ability to do so).
I’d bet if we find people competent in how movements achieve their goals, they will say that these particular hunger strikes are not great; and I expect it to be the case most of the time when individuals who share values with a larger movement decide to go on a hunger strike even as the larger movement thinks that would not be effective.
My strong impression is that the person on the hunger strike in front of Anthropic is doing this primarily because he feels like it is the proper thing to do in this situation, like it’s the action someone should be taking here.
Hi Mikhail, thanks for offering your thoughts on this. I think having more public discussion on this is useful and I appreciate you taking the time to write this up.
I think your comment mostly applies to Guido in front of Anthropic, and not our hunger strike in front of Google DeepMind in London.
I don’t think I have been framing Demis Hassabis as a villain and if you think I did it would be helpful to add a source for why you believe this.
I’m asking Demis Hassabis to “publicly state that DeepMind will halt the development of frontier AI models if all the other major AI companies agree to do so.” which I think is a reasonable thing to state given all public statements he made regarding AI Safety. I think that is indeed something that a company such as Google DeepMind would give in.
I’m currently in the UK, and I can tell you that there’s already been two pieces published on Business Insider. I’ve also given three interviews in the past 24 hours to journalists to contribute to major publications. I’ll try to add links later if / once these get published.
Again, I’m pretty sure I haven’t framed people as “awful”, and would be great if you could provide sources to that statement. I also don’t feel powerless. My motivation for doing this was in part to provide support to Guido’s strike in front of Anthropic, which feels more like helping an ally, joining forces.
I find it actually empowering to be able to be completely honest about what I actually think DeepMind should do to help stop the AI race and receive so much support from all kinds of people on the street, including employees from Google, Google DeepMind, Meta and Sony. I am also grateful to have Denys with me, who flew from Amsterdam to join the hunger strike, and all the journalists who have taken the time to talk to us, both in person and remotely.
I agree to the general point that taking decisions based on an actual theory of change is a much more effective way to have an impact in the world. I’ve personally thought quite a lot about why doing this hunger strike in front of DeepMind is net good, and I believe it’s having the intended impact, so I disagree with your implication that I’m basing my decisions on vibes. If you’d like to know more I’d be happy to talk to you in person in front of the DeepMind office or remotely.
Now, taking a step back and considering Guido’s strike, I want to say that even if you think that his actions were reckless and based on vibes, it’s worth evaluating whether his actions (and their consequences) will eventually turn out to be net negative. For one I don’t think I would have been out in front of DeepMind as I type this if it was not for Guido’s action, and I believe what we’re doing here in London is net good. But most importantly we’re still at the start of the strikes so it’s hard to tell what will happen as this continues. I’d be happy to have this discussion again at the end of the year, looking back.
Finally, I’d like to acknowledge the health risks involved. I’m personally looking over my health and there are some medics at King’s Cross that would be willing to help quickly if anything extreme was to happen. And given the length of the strikes so far I think what we’re doing is relatively safe, though I’m happy to be proven otherwise.
Thanks for responding!
Yep!
A hunger strike is not a good tool if you don’t want to paint someone as a villain in the eyes of the public when they don’t give in to your demand.
It is vanishingly unlikely that all other major AI companies would agree to do so without the US government telling them to; this statement would be helpful, but only to communicate their position and not because of the commitment itself. Why not ask them to ask the government to stop everyone (maybe conditional on China agreeing to stop everyone in China)?
If any of them go viral in the US with a good message, I’ll (somewhat) change my mind!
This was mainly my impression after talking to Guido; but do you want to say more about the impact you think you’ll have?
(Can come back to it at the end of the year; if you have any advance predictions, they might be helpful to have posted!)
I hope you remain safe and are not proven otherwise! Hunger strikes do carry negative risks though. Do you have particular plans for how long to be on the hunger strike for?
I have sent myself an email to arrive on December 20th to send you both a reminder about this thread.
Is there any form of protest that doesn’t implicitly imply that the person you’re protesting is doing something wrong? When the thing wrong is “causing human extinction” it seems to me kind of hard for that to not automatically be assumed ‘villainous’.
(Asking genuinely, I think it quite probably the answer is ‘yes’.)
Something like: Hunger strikes are optimized hard specifically for painting someone as a villain because they decide to make someone suffer or die (or be inhumanely fed), this is different from other forms of protests that are more focused on, e.g., that specific decisions are bad and should be revoked, but don’t necessarily try to make people perceive the other side as evil.
I don’t really see the problem with painting people as evil in principle, given that some people are evil. You can argue against it in specific cases, but I think the case for AI CEOs being evil is strong enough that it can’t be dismissed out of hand.
The case in question is “AI CEOs are optimising for their short-term status/profits, and for believing things about the world which maximise their comfort, rather than doing the due diligence required of someone in their position, which is to seriously check whether their company is building something which kills everyone”
Whether this is a useful frame for one’s own thinking—or a good frame to deploy onto the public—I’m not fully sure, but I think it does need addressing. Of course it might also differ between CEOs. I think Demis and Dario are two of the CEOs who it’s relatively less likely to apply to, but also I don’t think it applies weakly enough for them to be dismissed out of hand even in their cases.
“People are on hunger strikes” is not really a lot of evidence for “AI CEOs are optimizing for their short-term status/profits and are not doing the due diligence” in the eyes of the public.
I don’t think there’s any problem with painting people and institutions as evil, I’m just not sure why you would want to do this here, as compared to other things, and would want people to have answers to how they imagine a hunger strike would paint AI companies/CEOs and what would be the impact of that, because I expect little that could move the needle.
That is true. “People are on hunger strikes and the CEOs haven’t even commented” is (some) public evidence of “AI CEOs are unempathetic”
I misunderstood your point, I thought you were arguing against painting individuals as evil in general.
This seems to be exactly the point of the demand? This is a demand that would be cheap (perhaps even of negative cost) for DeepMind to accept (because the other AI companies wouldn’t agree to that), and would also be a major publicity win for the Pause AI crowd. Even counting myself skeptical of the hunger strikes, I think this is a very smart move.
the demand is that a specific company agrees to halt if everyone halts; this does not help in reality, because in fact it won’t be the case that everyone halts (abscent gov intervention).
I don’t think the point of hunger strikes is to achieve immediate material goals, but publicity/symbolic ones.
I think there’s a very reasonable theory of change—X-risk from AI needs to enter the Overton window. I see no justification here for going to the meta-level and claiming they did not think for 5 minutes, which is why I have weak downvoted in addition to strong disagree.
This tactic might not work, but I am not persuaded by your supposed downsides. The strikers should not risk their lives, but I don’t get the impression that they are. The movement does need people who are eating → working on AI safety research, governance, and other forms of advocacy. But why not this too? Seems very plausibly a comparative advantage for some concerned people, and particularly high leverage when very few are taking this step. If you think they should be doing something else instead, say specifically what it is and why these particular individuals are better suited to that particular task.
Michaël Trazzi’s comment, which he wrote a few hours before he started his hunger strike, isn’t directly about hunger striking but it does indicate to me that he put more than 5 minutes of thought into the decision, and his comment gestures at a theory of change.
I spoke to Michaël in person before he started. I told him I didn’t think the game theory worked out (if he’s not willing to die, GDM should ignore him; if he does die then he’s worsening the world, since he can definitely contribute better by being alive, and GDM should still ignore him). I don’t think he’s going to starve himself to death or serious harm, but that does make the threat empty. I don’t really think that matters too much on a game-theoretic-reputation method since nobody seems to be expecting him to do that.
His theory of change was basically “If I do this, other people might” which seems to be true: he did get another person involved. That other person has said they’ll do it for “1-3 weeks” which I would say is unambiguously not a threat to starve oneself to death.
As a publicity stunt it has kinda worked in the basic sense of getting publicity. I think it might change the texture and vibe of the AI protest movement in a direction I would prefer it to not go in. It certainly moves the salience-weighted average of public AI advocacy towards Stop AI-ish things.
As Mikhail said, I feel great empathy and respect for these people. My first instinct was similar to yours, though - if you’re not willing to die, it won’t work, and you probably shouldn’t be willing to die (because that also won’t work / there are more reliable ways to contribute / timelines uncertainty).
I think ‘I’m doing this to get others to join in’ is a pretty weak response to this rebuttal. If they’re also not willing to die, then it still won’t work, and if they are, you’ve wrangled them in at more risk than you’re willing to take on yourself, which is pretty bad (and again, it probably still won’t work even if a dozen people are willing to die on the steps of the DeepMind office, because the government will intervene, or they’ll be painted as loons, or the attention will never materialize and their ardor will wain).
I’m pretty confused about how, under any reasonable analysis, this could come out looking positive EV. Most of these extreme forms of protest just don’t work in America (e.g. the soldier who self-immolated a few years ago). And if it’s not intended to be extreme, they’ve (I presume accidentally) misbranded their actions.
Fair enough. I think these actions are +ev under a coarse grained model where some version of “Attention on AI risk” is the main currency (or a slight refinement to “Not-totally-hostile attention on AI risk”). For a domain like public opinion and comms, I think that deploying a set of simple heuristics like “Am I getting attention?” “Is that attention generally positive?” “Am I lying or doing something illegal?” can be pretty useful.
Michael said on twitter here that he’s had conversations with two sympathetic DeepMind employees, plus David Silver, who was also vaguely sympathetic. This itself is more +ev than I expected already, so I’m updating in favour of Michael here.
It’s also occurred to me that if any of the CEOs cracks and at least publicly responds the hunger strikers, then the CEOs who don’t do so will look villainous, so you actually only need to have one of them respond to get a wedge in.
“Attention on AI risk” is a somewhat very bad proxy to optimize for, where available tactics include attention that would be paid to luddites, lunatics, and crackpots caring about some issue.
The actions that we can take can:
Use what separates us from people everyone considers crazy: that our arguments check out and our predictions hold; communicate those;
Spark and mobilize existing public support;
Be designed to optimize for positive attention, not for any attention.
I don’t think DeepMind employees really changed their minds? Like, there are people at DeepMind with p(doom) higher than Eliezer’s; they would be sympathetic; would they change anything they’re doing? (I can imagine it prompting them to talk to others at DeepMind, talking about the hunger strike to validate the reasons for it.)
I don’t think Demis responding to the strike would make Dario look particularly villainous, happy to make conditional bets. How villainous someone looks here should be pretty independent, outside of eg Demis responding, prompting a journalist to ask Dario, which takes plausible deniability away from him.
I’m also not sure how effective it would be to use this to paint the companies (or the CEOs—are they even the explicit targets of the hunger strikes?) as villainous.
To clarify, “think for five minutes” was an appeal to people who might want to do these kinds of things in the future, not a claim about Guido or Michael.
That said, I do in fact claim they have not thought carefully about their theory of change, and the linked comment from Michael lists very obvious surface-level reasons for why do this in front of anthropic and not openai; I really would not consider this on the level of demonstrating having thought carefully about the theory of change.
While in principle, as I mentioned, a hunger strike can bring attention, this is not an effective way to do this for the particular issue that AI will kill everyone by default. The diff to communicate isn’t “someone is really scared of AI ending the world”; it’s “scientists think AI might literally kill everyone and also here are the reasons why”.
This was not a claim about these people but an appeal to potential future people to maybe do research on this stuff before making decisions like this one.
That said, I talked to Guido prior to the start of the hunger strike, tried to understand his logic, and was not convinced he had any kind of reasonable theory of change guiding his actions, and my understanding is that he perceives it as the proper action to take, in a situation like that, which is why I called this vibe-protesting.
(It’s not very clear what would be the conditions for them to stop the hunger strikes.)
Hunger strikes can be very effective and powerful if executed wisely. My comment expresses my strong opinion that this did not happen here, not that it can’t happen in general.
I think I somewhat agree, but also I think this is a more accurate vibe than “yay tech progress”. It seems like a step in the right direction to me.
You repeat a recommendation not to risk your life. Um, I’m willing to die to prevent human extinction. The math is trivial. I’m willing to die to reduce the risk by a pretty small percentage. I don’t think a single life here is particularly valuable on consequentialist terms.
There’s important deontology about not unilaterally risking other people’s lives, but this mostly goes away in the case of risking your own life. This is why there are many medical ethics guidelines that separate self-experimentation as a special case from rules for experimenting on others (and that’s been used very well in many cases and aligns incentives). I think one should have dignity and respect yourself, but I think there are many self-respecting situations where one should take major personal sacrifices and risk one’s whole life. (Somewhat similarly there are many situations to risk being prosecuted unjustly by the state and spending a great deal of your life in-prison.)
I don’t think so, I agree we shouldn’t have laws around this, but insofar as we have deontologies to correct for circumstances where historically our naive utility maximizing calculations have been consistently biased, I think there have been enough cases of people uselessly martyring themselves for their causes to justify a deontological rule not to sacrifice your own actual life.
Edit: Basically, I don’t want suicidal people to back-justify batshit insane reasons why they should die to decrease x-risk instead of getting help. And I expect these are the only people who would actually be at risk for a plan which ends with “and then I die, and there is 1% increased probability everyone else gets the good ending”.
I recently read The Sacrifices We Choose to Make by Michael Nielsen, which was a good read. Here are some relevant extracts.
Nielsen also includes unsuccessful or actively repugnant examples of it.
I also think this paragraph about Quang Duc is quite relevant:
I’m not certain if there’s a particular point you want me to take away from this, but thanks for the information, and including an unbiased sample from the article you linked. I don’t think I changed my mind so much from reading this though.
Do you also believe there is a deontological rule against suicide? I have heard rumor that most people who attempt suicide and fail, regret it. At the same time, I think some lives are worse than death (for example, see Amanda Luce’s Book Review: Two Arms And A Head that won the ACX book review prize), and so I believe it should be legal and sometimes supported, even if it were the case that most attempted suicides have been regretted.
After doing some research on this, I think this is unlikely to be true. The only quantitative study I found says that among its sample of suicide attempt survivors, 35.6% are glad to have survived, while 42.7% feel ambivalent, and 21.6% regret having survived. I also found a couple of sources agreeing with your “rumor”, but one cited just a suicide awareness trainer as its source, while the other cited the above study as the only evidence for its claim, somehow interpreting it as “Previous research has found that more than half of suicidal attempters regret their suicidal actions.” (Gemini 2.5 Pro says “It appears the authors of the 2023 paper misinterpreted or misremembered the findings of the 2005 study they cited.”)
If this “rumor” was true, I would expect to see a lot of studies supporting it, because such studies are easy to do and the result would be highly useful for people trying to prevent suicides (i.e., they can use it to convince potential suicide attempters that they’re likely to regret it). Evidence to the contrary are likely to be suppressed or not gathered in the first place, as almost nobody wants to encourage suicides. (The above study gathered the data incidentally, for a different purpose.) So everything seems consistent with the “rumor” being false.
Interesting, thanks. I think I had heard the rumor before and believed it.
In the linked study, it looks like they asked the people about regret very shortly after the suicide attempt. This could both bias the results towards less regret to have survived (little time to change their mind) or more regret to have survived (people might be scared to signal intent to retry suicide, for fear of being committed, which I think sometimes happens soon after failed attempts).
I think very very many people are not making an informed decision when they decide to commit suicide.
For example, I think quantum immortality is quite plausibly a thing. Very few people know about quantum immortality and even fewer have seriously thought about it. This means that almost everyone on the planet might have a very mistaken model of what suicide actually does to their anticipated experience.[1] Also, many people are religious and believe in a pleasant afterlife. Many people considering suicide are mentally ill in a way that compromises their decision making. Many people think transhumanism is impossible and won’t arrange for their brain to be frozen for that reason.
I agree that there is some threshold on the fraction of ill-considered suicides relative to total suicides such that suicide should be legal if we were below that threshold. I used to think we were maybe below that threshold. After I began studying physics at uni and so started taking quantum immortality more seriously, I switched to thinking we are maybe above the threshold.
You might find yourself in a branch where your suicide attempt failed, but a lot of your body and mind were still destroyed. If you keep exponentially decreasing the amplitude of your anticipated future experience in the universal wave function further, you might eventually find that it is now dominated by contributions from weird places and branches far-off in spacetime or configuration space that were formerly negligible, like aliens simulating you for some negotiation or other purpose.
I don’t really know yet how to reason well about what exactly the most likely observed outcome would be here. I do expect that by default, without understanding and careful engineering our civilisation doesn’t remotely have the capability for yet, it’d tend to be very Not Good.
This all feels galaxy-brained to me and like it proves too much. By analogy I feel like if you thought about population ethics for a while and came to counterintuitive conclusions, you might argue that people who haven’t done that shouldn’t be allowed to have children; or if they haven’t thought about timeless decision theory for a while they aren’t allowed to get a carry license.
I don’t think it proves too much. Informed decision-making comes in degrees, and some domains are just harder? Like, I think my threshold for leaving people free to make their own mistakes if they are the only ones harmed by them is very low, compared to where the human population average seems to be at the moment. But my threshold is, in fact, greater than zero.
For example, there are a bunch of things I think bystanders should generally prevent four year old human children from doing, even if the children insist that they want to do them. I know that stopping four year old children from doing these things will be detrimental in some cases, and that having such policies is degrading to the childrens’ agency. I remember what it was like being four years old and feeling miserable because of kindergarten teachers who controlled my day and thought they knew what was best for me. I still think the tradeoff is worth it on net in some cases.
I just think that the suicide thing happens to be a case where doing informed decision-making is maybe just too tough for way too many humans and thus some form of ban could plausibly be worth it on net. Sports betting is another case where I was eventually convinced that maybe a legal ban of some form could be worth it.
(I agree with Lucious in that I think it is important that people have the option of getting cryopreserved and also are aware of all the reality-fluid stuff before they decide to kill themselves.)
“Important” is ambiguous, in that I agree it matters, but it does for this civilization to ban whole life options from people until they have heard about niche philosophy. Most people will never hear about niche philosophy.
I don’t think quantum immortality changes anything. You can rephrame this in terms of standard probability theory and condition on them continuing to have subjective experience, and still get to the same calculus.
However, only considering the branches in which you survive, or conditioning on having subjective experience after the suicide attempt, ignores the counterfactual suffering prevented in all the branches (or probability mass) in which you did die, which may be less unpleasant than the branches in which you survived, but are many many more in number! Ignoring those branches biases the reasoning toward rare survival tails that don’t dominate the actual expected utility.
I agree that quantum mechanics is not really central for this on a philosophical level. You get a pretty similar dynamic just from having a universe that is large enough to contain many almost-identical copies of you. It’s just that it seems at present very unclear and arguable whether the physical universe is in fact anywhere near that large, whereas I would claim that a universal wavefunction which constantly decoheres into different branches containing different versions of us is pretty strongly implied to be a thing by the laws of physics as we currently understand them.
It is very late here and I should really sleep instead of discussing this, so I won’t be able to reply as in-depth as this probably merits. But, basically, I would claim that this is not the right way to do expected utility calculations when it comes to ensembles of identical or almost-identical minds.
A series of thought experiments might maybe help illustrate part of where my position comes from:
Imagine someone tells you that they will put you to sleep and then make two copies of you, identical down to the molecular level. They will place you in a room with blue walls. They will place one copy of you in a room with red walls, and the other copy in another room with blue walls. Then they will wake all three of you up.
What color do you anticipate seeing after you wake up, and with what probability?
I’d say 2⁄3 blue, 1⁄3 red. Because there will now be three versions of me, and until I look at the walls I won’t know which one I am.
Imagine someone tells you that they will put you to sleep and then make two copies of you. One copy will not include a brain. It’s just a dead body with an empty skull. Another copy will be identical to you down to the molecular level. Then they will place you in a room with blue walls, and the living copy in a room with red walls. Then they will wake you and the living copy up.
What color do you anticipate seeing after you wake up, and with what probability? Is there a 1⁄3 probability that you ‘die’ and don’t experience waking up because you might end up ‘being’ the corpse-copy?
I’d say 1⁄2 blue, 1⁄2 red, and there is clearly no probability of me ‘dying’ and not experiencing waking up. It’s just a bunch of biomass that happens to be shaped like me.
As 2, but instead of creating the corpse-copy without a brain, it is created fully intact, then its brain is destroyed while it is still unconscious. Should that change our anticipated experience? Do we now have a 1⁄3 chance of dying in the sense that we might not experience waking up? Is there some other relevant sense in which we die, even if it does not affect our anticipated experience?
I’d say no and no. This scenario is identical to 2 in terms of the relevant information processing that is actually occurring. The corpse-copy will have a brain, but it will never get to use it, so it won’t affect my expected anticipated experience in any way. Adding more dead copies doesn’t change my anticipated experience either. My best scoring prediction will be that I have 1⁄2 chance of waking up to see red walls, and 1⁄2 chance of waking up to see blue walls.
In real life, if you die in the vast majority of branches caused by some event, i.e. that’s where the majority of the amplitude is, but you survive in some, the calculation for your anticipated experience would seem to not include the branches where you die for the same reason it doesn’t include the dead copies in thought experiments 2 and 3.
(I think Eliezer may have written about this somewhere as well using pretty similar arguments, maybe in the quantum physics sequence, but I can’t find it right now.)
Again, not sure why a large universe is needed. The expected utility ends up the same either way, whether you have some fraction of branches in which you remain alive or some probability of remaining alive.
Regarding the expected utility calculus. I agree with everything you said but i don’t see how any of it allows you to disregard the counterfactual suffering from not committing suicide in your expected value calculation.
Maybe the crux is whether we consider the utility of each “you” (i.e. you in each branch) individually, and add it up for the total utility, or wether we consider all “you”s to have just one shared utility.
Let’s say that not committing suicide gives you −1 utility in n branches but commiting suicide gives you −100 utility in n/m branches and 0 utility in n−n/m branches
If we treat all copies of you as having separate utilities and add them all up for a total expected utility calculation, not committing suicide gives −n utility while committing suicide leads to −100n/m utility. Therefore, as long as m>100, it is better to commit suicide.
If, on the other hand you treat them as having one shared utility, you get either −1 or −100 utility, and −100 is of course worse.
Do you agree that this is the crux? If so, why do you think that all the copies share one utility rather than their utilities adding up?
In a large universe, you do not end. Like, not in expectation see some branch versus other; you just continue, the computation that is you continues. When you open your eyes, you’re not likely to find yourself as a person in a branch computed only relatively rarely; still, that person continues, and does not die.
Attemted suicide reduces your reality-fluid- how much you’re computed and how likely you are to find yourself there- but you will continue to experience the world. If you die in a nuclear explosion, the continuation of you will be somewhere else, sort-of isekaied; and mostly you will find yourself not in a strange world that recovers the dead but in a world where the nuclear explosion did not appear; still, in a large world, even after a nuclear explosion, you continue.
You might care about having a lot of reality-fluid, because this makes your actions more impactful, because you can spend your lightcone better, and improve the average experience in the large universe. You might also assign negative utility to others seeing you die; they’ll have a lot of reality-fluid in worlds where you’re dead and they can’t talk to you, even as you continue. But I don’t think it works out to assigning the same negative utility to dying as in branches of small worlds.
Yes, but the number of copies of you still reduces (or the probability that you are alive in standard probability theory, or the number of branches in many worlds). Why are these not equivalent in terms of the expected utility calculus?
Imagine they you’re an agent in the game of life. Your world, your laws of physics are computed on a very large independent computers; all performing the same computation.
You exist within the laws of causality of your world, computed as long as at least one server computes your world. If some of them stop performing the computation, it won’t be a death of a copy; you’ll just have one fewer instance of yourself.
Whats the difference between fewer instances and fewer copies, and why is that load bearing for the expected utility calculation?
You are of course right that there’s no difference between reality-fluid and normal probabilities in a small world: it’s just how much you care about various branches relative to each other, regardless of whether all of them will exist or only some.
I claim that the negative utility due to stopping to exist is just not there, because you don’t actually stop to exist in a way you reflectively care about, when you have fewer instances. For normal things (e.g., how much do you care about paperclips), the expected utility is the same; but here, it’s the kind of terminal value that i expect for most people would be different; guaranteed continuation in 5% of instances is much better than 5% chance of continuing in all instances; in the first case, you don’t die!
But we are not talking about negative utility due to stopping to exist. We are talking about avoiding counterfactual negative utility by committing suicide, which still exists!
I think this is an artifact of thinking of all of the copies having a shared utility (i.e. you) rather than separate utilities that add up (i.e. so many yous will suffer if you don’t commit suicide). If they have separate utilities, we should think of them as separate instances of yourself.
And even in the case where we are assigning negative utility to death, most people are really considering counterfactual utility from being alive, and 95% of that (expected) counterfactual utility is lost whether 95% of the “instances of you” die or whether there is a 95% chance that “you” die.
I think there is, and I think cultural mores well support this. Separately, I think we shouldn’t legislate morality and though suicide is bad, it should be legal[1].
There also exist cases where it is in fact correct from a utilitarian perspective to kill, but this doesn’t mean there is no deontological rule against killing. We can argue about the specific circumstances where we need these rule carve-outs (eg war), but I think we’d agree that when it comes to politics and policy, there ought to be no carve-outs, since people are particularly bad at risk-return calculations in that domain.
But also this would mean we have to deal with certain liability issues, eg if ChatGPT convinces a kid to kill themselves, we’d like to say this is manslaughter or homicide iff the kid otherwise would’ve gotten better, but how do we determine that? I don’t know, and probably on net we should choose freedom instead, or this isn’t actually much a problem in practice.
Makes sense. I don’t hold this stance; I think my stance is that many/most people are kind of insane on this, but that like with many topics we can just be more sane if we try hard and if some of us set up good institutions around it for helping people have wisdom to lean on in thinking about it, rather than having to do all their thinking themselves with their raw brain.
(I weakly propose we leave it here, as I don’t think I have a ton more to say on this subject right now.)
To clarify, I meant that the choice of actions was based on the vibes, not on careful consideration, this seeming like the right thing to do in these circuimstances.
I maybe formulated this badly.
I do not disagree with that part of your comment. I did, in fact, risk being prosecuted unjustly by the state and spending a great deal of my life in prison. I was also aware of the kinds of situations I’d want to go for hunger strikes in while in prison, though didn’t think about that often.
And I, too, am willing to die to reduce the risk by a pretty small chance.
Most of the time, though, I think people who think they have this choice don’t actually face it; I think the bar for risking one’s life should be very high. In particular, when people have time to carefully do the math, I really want them to carefully do the math before deciding to risk their lives, and in this particular case, some of my frustration is from the people getting their math wrong.
I think as a community, we also would really want to make people err on the side of safety, and have a strong norm of assumption that most people who decide to sacrifice their lives got their math wrong. People really shouldn’t be risking their lives without having carefully thought of the theory of change when they have the ability to do so.
Like, I’d bet if we find people competent in how movements achieve their goals, they will say that these particular hunger strikes are not great; and I expect it to be the case most of the time when individuals who share values with a larger movement decide to go on a hunger strike even as the larger movement thinks that would not be effective.
I think I somewhat agree that these hunger strikes will not shut down the companies or cause major public outcry.
I think that there is definitely something to be said that potentially our society is very poor at doing real protesting, and will just do haphazard things and never do anything goal-directed. That’s potentially a pretty fundamental problem.
But setting that aside (which is a big thing to set aside!) I think the hunger-strike is moving in the direction of taking this seriously. My guess is most projects in the world don’t quite work, but they’re often good steps to help people figure out what does work. Like, I hope this readies people to notice opportunities for hunger strikes, and also readies them to expect people to be willing to make large sacrifices on this issue.
People do in fact try to be very goal-directed about protesting! They have a lot of institutional knowledge on it!
You can study what worked and what didn’t work in the past, and what makes a difference between a movement that succeeds and a movement that doesn’t. You can see how movements organize, how they grow local leaders, how they come up with ideas that would mobilize people.
A group doesn’t have to attempt a hunger strike to figure out what the consequences would be; it can study and think, and I expect that to be a much more valuable use of time than doing hunger strikes.
I’d be interested to read a quick post from you that argued “Hunger-strikes are not the right tool for this situation; here is what they work for and what they don’t work for. Here is my model of this situation and the kind of protests that do make sense.”
I don’t know much about protesting. Most of the recent ones that get big enough that I hear about them have been essentially ineffectual as far as I can recall (Occupy Wallstreet, Women’s March, No Kings). I am genuinely interested in reading about effective and clearly effective protests led by anyone currently doing protests, or within the last 10 years. Even if on a small scale.
(My thinking isn’t that protests have not worked in the past – I believe they have, MLK, Malcolm X, Women’s Suffrage Movement, Vietnam War Protest, surely more – but that the current protesting culture has lost its way and is no longer effective.)
Caveat that I don’t know much more than this, but I’m reminded of James Ozden’s lit reviews, e.g. How effective are protests? Some research and some nuance. Ostensibly relevant bits:
(Would be interested in someone going through this paper and writing a post or comment highlighting some examples and why they’re considered successful.)
Not quite responding to your main point here, but I’ll say that this position would seem valid to me and good to say if you believed it.
I don’t know what personal life tradeoffs any of them are making, so I have a hard time speaking to that. I just found out that Michael Trazzi is one of the people doing a hunger strike; I don’t think it’s true of him that he hasn’t thought seriously about the issues given how he’s been intellectually engaged for 5+ years.
Yep, I basically believe this.
(Social movements (and comms and politics) are not easy to reason about well from first principles. I think Michael is wrong to be making this particular self-sacrifice, not because he hasn’t thought carefully about AI but because he hasn’t thought carefully about hunger strikes.)
Relevantly, if any of them actually die, and if also it does not cause major change and outcry, I will probably think they made a foolish choice (where ‘foolish’ means ‘should have known in advance this was the wrong call on a majorly important decision’).
My modal guess is that they will all make real sacrifice, and stick it out for 10-20 days, then wrap up.
On the object level it’s (also) important to emphasize that these guys don’t seem to be seriously risking their lives. At least one of them noted he’s taking vitamins, hydrating etc. On consequentialist grounds I consider this to be an overdetermined positive.
a hunger strike will eventually kill you even if you take vitamins, electrolytes, and sugar. (a way to prevent death despite the target not giving in is often a group of supporters publicly begging the person on the hunger strike to stop and not kill themselves for some plausible reasons, but sometimes people ignore that and die.) I’m not entirely sure what Guido’s intention is if Anthropic doesn’t give in.
Sure, I just want to defend that it would also be reasonable if they were doing a more intense and targeted protest. “Here is a specific policy you must change” and “I will literally sacrifice my life if you don’t make this change”. So I’m talking about the stronger principle.
Isn’t suicide already legal in most places?
I think in a lot of places the government will try to stop you, including using violence.
I don’t strongly agree or disagree with your empirical claims but I do disagree with the level of confidence expressed. Quoting a comment I made previously:
I’m undecided on whether things like hunger strikes are useful but I just want to comment to say that I think a lot of people are way too quick to conclude that they’re not useful. I don’t think we have strong (or even moderate) reason to believe that they’re not useful.
When I reviewed the evidence on large-scale nonviolent protests, I concluded that they’re probably effective (~90% credence). But I’ve seen a lot of people claim that those sorts of protests are ineffective (or even harmful) in spite of the evidence in their favor.[1] I think hunger strikes are sufficiently different from the sorts of protests I reviewed that the evidence might not generalize, so I’m very uncertain about the effectiveness of hunger strikes. But what does generalize, I think, is that many peoples’ intuitions on protest effectiveness are miscalibrated.
[1] This may be less relevant for you, Mikhail Samin, because IIRC you’ve previously been supportive of AI pause protests in at least some contexts.
ETA: To be clear, I’m responding to the part of your post that’s about whether hunger strikes are effective. I endorse positive message of the second half of your post.
ETA 2: I read Ben Pace’s comment and he is making some good points so now I’m not sure I endorse the second half.
To be very clear, I expect large social movements that use protests as one of its forms of action to have the potential to be very successful and impactful if done well. Hunger strikes are significantly different from protests. Hunger strikes can be powerful, but they’re best for very different contexts.
Aside from whether or not the hunger strikes are a good idea, I’m really glad they have emphasized conditional commitments in their demands
I think that we should be pushing on these much much more: getting groups to say “I’ll do X if abc groups do X as well”
And should be pushing companies/governments to be clear whether their objection is “X policy is net-harmful regardless of whether anyone else does it” vs “X is net-harmful for us if we’re the only ones to do it”
[I recognize that some of this pushing/clarification might make sense privately, and that groups will be reluctant to stay stuff like this publicly because of posturing and whatnot.]
(While I like it being directed towards coordination, it would not actually make a difference, as it won’t be the case that all AI companies want to stop, and so it would still not be of great significance. The thing that works is a gov-supported ban on developing ASI anywhere in the world. A commitment to stop if everyone else stops doesn’t actually come into force unless everyone is required to stop anyway.
An ask that works is, e.g., “tell the government they need to stop everyone, including us”.)
I think we should show some solidarity to people committed to their beliefs and making a personal sacrifice, rather than undermining them by critiquing their approach.
Given that they’re both young men and it is occurring in a first world country, it seems unlikely anyone will die. But it does seem likely they or their friends will read this thread.
Beyond that, the hunger strike is only on day 2 and is has already received a small amount of media coverage. Should they go viral then this one action alone will have a larger differential impact on reducing existential risk than most safety researchers will achieve in their entire careers.
https://www.businessinsider.com/hunger-strike-deepmind-ai-threat-fears-agi-demis-hassabis-2025-9
This is surprising to hear on LessWrong, where we value truth without having to think of object-level reasons for why it is good to say true things. But on the object level: it would be very dangerous for a community to avoid saying true things because it is afraid of undermining someone’s sacrifice; this would lead to a lot of needless, and even net-negative, sacrifice, without mechanisms for self-correction. Like, if I ever do something stupid, please tell me (and everyone) that instead of respecting my sacrifice: I would not want others to repeat my mistakes.
(There are lots of ways to get media coverage and it’s not always good in expectation. If they go viral, in a good way/with a good message, I will somewhat change my mind.)