I see no justification here for going to the meta-level and claiming they did not think for 5 minutes
Michaël Trazzi’s comment, which he wrote a few hours before he started his hunger strike, isn’t directly about hunger striking but it does indicate to me that he put more than 5 minutes of thought into the decision, and his comment gestures at a theory of change.
I spoke to Michaël in person before he started. I told him I didn’t think the game theory worked out (if he’s not willing to die, GDM should ignore him; if he does die then he’s worsening the world, since he can definitely contribute better by being alive, and GDM should still ignore him). I don’t think he’s going to starve himself to death or serious harm, but that does make the threat empty. I don’t really think that matters too much on a game-theoretic-reputation method since nobody seems to be expecting him to do that.
His theory of change was basically “If I do this, other people might” which seems to be true: he did get another person involved. That other person has said they’ll do it for “1-3 weeks” which I would say is unambiguously not a threat to starve oneself to death.
As a publicity stunt it has kinda worked in the basic sense of getting publicity. I think it might change the texture and vibe of the AI protest movement in a direction I would prefer it to not go in. It certainly moves the salience-weighted average of public AI advocacy towards Stop AI-ish things.
As Mikhail said, I feel great empathy and respect for these people. My first instinct was similar to yours, though - if you’re not willing to die, it won’t work, and you probably shouldn’t be willing to die (because that also won’t work / there are more reliable ways to contribute / timelines uncertainty).
I think ‘I’m doing this to get others to join in’ is a pretty weak response to this rebuttal. If they’re also not willing to die, then it still won’t work, and if they are, you’ve wrangled them in at more risk than you’re willing to take on yourself, which is pretty bad (and again, it probably still won’t work even if a dozen people are willing to die on the steps of the DeepMind office, because the government will intervene, or they’ll be painted as loons, or the attention will never materialize and their ardor will wain).
I’m pretty confused about how, under any reasonable analysis, this could come out looking positive EV. Most of these extreme forms of protest just don’t work in America (e.g. the soldier who self-immolated a few years ago). And if it’s not intended to be extreme, they’ve (I presume accidentally) misbranded their actions.
Fair enough. I think these actions are +ev under a coarse grained model where some version of “Attention on AI risk” is the main currency (or a slight refinement to “Not-totally-hostile attention on AI risk”). For a domain like public opinion and comms, I think that deploying a set of simple heuristics like “Am I getting attention?” “Is that attention generally positive?” “Am I lying or doing something illegal?” can be pretty useful.
Michael said on twitter here that he’s had conversations with two sympathetic DeepMind employees, plus David Silver, who was also vaguely sympathetic. This itself is more +ev than I expected already, so I’m updating in favour of Michael here.
It’s also occurred to me that if any of the CEOs cracks and at least publicly responds the hunger strikers, then the CEOs who don’t do so will look villainous, so you actually only need to have one of them respond to get a wedge in.
“Attention on AI risk” is a somewhat very bad proxy to optimize for, where available tactics include attention that would be paid to luddites, lunatics, and crackpots caring about some issue.
The actions that we can take can:
Use what separates us from people everyone considers crazy: that our arguments check out and our predictions hold; communicate those;
Spark and mobilize existing public support;
Be designed to optimize for positive attention, not for any attention.
I don’t think DeepMind employees really changed their minds? Like, there are people at DeepMind with p(doom) higher than Eliezer’s; they would be sympathetic; would they change anything they’re doing? (I can imagine it prompting them to talk to others at DeepMind, talking about the hunger strike to validate the reasons for it.)
I don’t think Demis responding to the strike would make Dario look particularly villainous, happy to make conditional bets. How villainous someone looks here should be pretty independent, outside of eg Demis responding, prompting a journalist to ask Dario, which takes plausible deniability away from him.
I’m also not sure how effective it would be to use this to paint the companies (or the CEOs—are they even the explicit targets of the hunger strikes?) as villainous.
To clarify, “think for five minutes” was an appeal to people who might want to do these kinds of things in the future, not a claim about Guido or Michael.
That said, I do in fact claim they have not thought carefully about their theory of change, and the linked comment from Michael lists very obvious surface-level reasons for why do this in front of anthropic and not openai; I really would not consider this on the level of demonstrating having thought carefully about the theory of change.
Michaël Trazzi’s comment, which he wrote a few hours before he started his hunger strike, isn’t directly about hunger striking but it does indicate to me that he put more than 5 minutes of thought into the decision, and his comment gestures at a theory of change.
I spoke to Michaël in person before he started. I told him I didn’t think the game theory worked out (if he’s not willing to die, GDM should ignore him; if he does die then he’s worsening the world, since he can definitely contribute better by being alive, and GDM should still ignore him). I don’t think he’s going to starve himself to death or serious harm, but that does make the threat empty. I don’t really think that matters too much on a game-theoretic-reputation method since nobody seems to be expecting him to do that.
His theory of change was basically “If I do this, other people might” which seems to be true: he did get another person involved. That other person has said they’ll do it for “1-3 weeks” which I would say is unambiguously not a threat to starve oneself to death.
As a publicity stunt it has kinda worked in the basic sense of getting publicity. I think it might change the texture and vibe of the AI protest movement in a direction I would prefer it to not go in. It certainly moves the salience-weighted average of public AI advocacy towards Stop AI-ish things.
As Mikhail said, I feel great empathy and respect for these people. My first instinct was similar to yours, though - if you’re not willing to die, it won’t work, and you probably shouldn’t be willing to die (because that also won’t work / there are more reliable ways to contribute / timelines uncertainty).
I think ‘I’m doing this to get others to join in’ is a pretty weak response to this rebuttal. If they’re also not willing to die, then it still won’t work, and if they are, you’ve wrangled them in at more risk than you’re willing to take on yourself, which is pretty bad (and again, it probably still won’t work even if a dozen people are willing to die on the steps of the DeepMind office, because the government will intervene, or they’ll be painted as loons, or the attention will never materialize and their ardor will wain).
I’m pretty confused about how, under any reasonable analysis, this could come out looking positive EV. Most of these extreme forms of protest just don’t work in America (e.g. the soldier who self-immolated a few years ago). And if it’s not intended to be extreme, they’ve (I presume accidentally) misbranded their actions.
Fair enough. I think these actions are +ev under a coarse grained model where some version of “Attention on AI risk” is the main currency (or a slight refinement to “Not-totally-hostile attention on AI risk”). For a domain like public opinion and comms, I think that deploying a set of simple heuristics like “Am I getting attention?” “Is that attention generally positive?” “Am I lying or doing something illegal?” can be pretty useful.
Michael said on twitter here that he’s had conversations with two sympathetic DeepMind employees, plus David Silver, who was also vaguely sympathetic. This itself is more +ev than I expected already, so I’m updating in favour of Michael here.
It’s also occurred to me that if any of the CEOs cracks and at least publicly responds the hunger strikers, then the CEOs who don’t do so will look villainous, so you actually only need to have one of them respond to get a wedge in.
“Attention on AI risk” is a somewhat very bad proxy to optimize for, where available tactics include attention that would be paid to luddites, lunatics, and crackpots caring about some issue.
The actions that we can take can:
Use what separates us from people everyone considers crazy: that our arguments check out and our predictions hold; communicate those;
Spark and mobilize existing public support;
Be designed to optimize for positive attention, not for any attention.
I don’t think DeepMind employees really changed their minds? Like, there are people at DeepMind with p(doom) higher than Eliezer’s; they would be sympathetic; would they change anything they’re doing? (I can imagine it prompting them to talk to others at DeepMind, talking about the hunger strike to validate the reasons for it.)
I don’t think Demis responding to the strike would make Dario look particularly villainous, happy to make conditional bets. How villainous someone looks here should be pretty independent, outside of eg Demis responding, prompting a journalist to ask Dario, which takes plausible deniability away from him.
I’m also not sure how effective it would be to use this to paint the companies (or the CEOs—are they even the explicit targets of the hunger strikes?) as villainous.
To clarify, “think for five minutes” was an appeal to people who might want to do these kinds of things in the future, not a claim about Guido or Michael.
That said, I do in fact claim they have not thought carefully about their theory of change, and the linked comment from Michael lists very obvious surface-level reasons for why do this in front of anthropic and not openai; I really would not consider this on the level of demonstrating having thought carefully about the theory of change.