I personally find this pretty heartening. Anthropic is saying “someone should probably stop us if they can stop everyone” and I think that’s a pretty defensible stance. It’s not necessarily correct, but arguments about the difficulty of alignment vs. the badness of bad actors with AGI are much looser than we’d like.
Yes, though surely there must be steps they could take towards such a coordination mechanism on their own? Like, if they said “Anyone who gives us 100 billion dollars within the next week gets standing to sue if we breach or change our RSP within 5 years.” I would already find that their most heartening commitment yet.
As I mentioned last week I’d love it if they’d acknowledge the potential for concentrated influence/power risk in principle, even if they think they won’t or can’t do that.
How the alignment problem gets solved—or not—in this future is something we are least certain about.
…
But if a slowdown simply lets the least cautious actors catch up technologically, it could leave everyone less safe.
Another crazy text in these crazy times. “We don’t know how to solve the alignment problem, but we’re going to race ahead anyway, because otherwise less cautious actors will win.” Which less cautious actors, Anthropic?
And also seconding Oliver’s question. What about power concentration, Anthropic? Your CEO has said literally this: “Anthropic has much more in common with the Department of War than we have differences.” Alignment to whom?
Additionally, Anthropic tried including the anti-power-concentration clause into Claude’s Constitution: “We’re especially concerned about the use of AI to help individual humans or small groups gain unprecedented and illegitimate forms of concentrated power. In order to avoid this, Claude should generally try to preserve functioning societal structures, democratic institutions, and human oversight mechanisms, and to avoid taking actions that would concentrate power inappropriately or undermine checks and balances.”
As for the impact on jobs, what about Anthropic’s CEO who explicitly said: “I do think in the long run AI will become so broadly effective and so cheap that this will no longer apply. At that point our current economic setup will no longer make sense, and there will be a need for a broader societal conversation about how the economy should be organized”?
Could you suggest what else Anthropic should’ve done to acknowledge those issues?
Aren’t xAI who fails to care about alignment and China who trains models to censor themselves perfect examples of such bad actors?
Well, racing to recursive self-improvement without solving alignment kills everyone. A company can’t justify that by pointing to other “bad actors”, at that point they’re a bad actor themselves.
And even before killing everyone, there’s other stuff that needs mentioning. Anthropic, like the other US labs afaik, has agreed that its models can be used by the US government for blanket surveillance of non-Americans. The US has a history of supporting nasty regimes abroad (see Operation Condor in South America) and sending them data to help political repression (see the Indonesia massacre).
They don’t think it kills everyone with high likelihood. The problem is that no one knows for sure how hard alignment is. There are no really convincing arguments on either side. It’s a wild guess and people are free to pick whatever their Motivated reasoningprefers.
But something convinced them to split from OpenAI and start a more alignment-focused lab in the first place. So there must’ve been convincing arguments (to them) then. The simplest explanation why they’re racing to RSI now is that they got closer to money and power.
Anthropic’s p(doom) is probably a lot lower than 1, but it still can be much higher than 0 while still justifying their current behavior. They only need to presuppose that they have better alignment techniques and care more about alignment than their competitors, and right now they have good reasons to think that way. So let’s assume that you’re a decision maker at Anthropic who do think you have better alignment and would like to decide whether to (unilaterally) keep racing or not.
If you think p(doom) approximates one, then the only way to play is to stop racing yourself, since you can’t make things any worse.
If you think p(doom) is zero, then of course you should race, but if the Anthropic people thought that way they wouldn’t have formed the company in the first place.
However, if p(doom) is some number in the middle (let’s say anywhere between 1-80%), and you think your own lab has better alignment than your competitors to a degree that outweighs the overall acceleration to the industry that you yourself bring to the world, then the best way to minimize p(doom) is to race and stay ahead of everyone else. If you don’t, your competitor is right behind and will catch up to you and then proceed to not care about alignment as much as you do, resulting in a higher p(doom).
Amodio seems to think there’s a chance of misalignment, and I’m betting he and the other the founders didn’t like or trust Altman. But they have auch lower p(doom) than you or I.
I do think getting closer to money and power has shifted their thinking through motivated reasoning; but so has thinking they might lead the world to AI success and be the greatest heroes in history.
We need better arguments if we hope to get the world to stop them and everyone else from developing AGI. Failing that, we need to help them align it. Those two projects are synergistic because they both involve understanding alignment and misalignment risks very clearly.
This thread is going in circles, let me restart cleanly.
If outright misalignment (AI straight up kills everyone) is averted, the next most likely outcome is alignment to power. This is what the question “alignment to whom” is about, see my first comment. For most people in the world, including me (a non-American, etc), this means permanent subjugation which is as bad as death or worse. Right now Anthropic’s answer to “alignment to whom” is unsatisfactory, see my second comment. So no, nobody should help them align it, unless they straighten the “alignment to whom” story.
Permanent subjugation is as bad as death? I most certainly agree that subjugation would be worse than death for some particularly sadistic and sociopathic leaders. I disagree, for about 99% of “subjugators”. You and I have discussed this before, and it’s a deeper disagreement than we’ll be able to resolve here.
It’s also urgent to resolve this question. AI safety strategy depends on it, as this conversation demonstrates. But arguing about it, or declaring it as though either you or I know the answer, is not going to help with this critical project.
I do hope to come back and write about this carefully. I hope to see you and others who share your stance refine the arguments and evidence for “the average leader, given unlimited power, will kill you or make your life miserable or worse”.
I don’t remember where our most complete thread on this is; I think we had a more complete discussion on some’s short and incomplete top-level post on this topic. We need better arguments on both sides so we can get a better guess.
I’d love to! I like the format. The challenge is finding time. Timeboxing it would be great for efficiency. I wonder about posting it as one shared-authorship post where we each summarize our argument, written last, linked to two individually authored posts.
The case for the average leader making your life miserable is that the leader is able to seize the entirety of resources and not need the others for the vast majority of purposes.
A case against the leader making your life miserable would be something along the lines ‘The leader has a high enough integrity to propagate resources to one’s friends,[1] some of whom propagate the resources further,[2] the graph of propagations is likely connected to you’ or ‘The leader doesn’t gain anything by robbing the majority of people’.
The closest pre-ASI equivalents of an arrangement where the graph doesn’t connect the leader to the majority of people are:
Outlawed malpractices like explicit racial segregation;
Class barriers, like those described by conspiracy theories[3] or existing in capitalism as described by socialists;
States during some periods of decline.
The latter example is especially interesting because such periods did often emerge in human history and would end with the elites being purged after a leader understood their incompetence or the entire power structure being disrupted and eliminating the incompetent in a different manner. The AIs, on the other hand, would systematically prevent disruptions and fail to prevent power concentrations unless told to.
However, Claude’s current Constitution does have the line explicitlytrying to prohibit it from concentrating power, which I quoted above. Does that imply that OpenAI and GDM, let alone xAI or Meta, are to be prevented from taking part in creation of the ASI?
P.S. How likely is it that the entire premise of ASIs perfectly aligned to any whims is false because the ASIs either arrive at a true morality or commit genocide?
For example, if an International Jewish conspiracy existed and the ASI was aligned to one of its members, then everyone not in the conspiracy would be doomed.
I personally find this pretty heartening. Anthropic is saying “someone should probably stop us if they can stop everyone” and I think that’s a pretty defensible stance. It’s not necessarily correct, but arguments about the difficulty of alignment vs. the badness of bad actors with AGI are much looser than we’d like.
Yes, though surely there must be steps they could take towards such a coordination mechanism on their own? Like, if they said “Anyone who gives us 100 billion dollars within the next week gets standing to sue if we breach or change our RSP within 5 years.” I would already find that their most heartening commitment yet.
As I mentioned last week I’d love it if they’d acknowledge the potential for concentrated influence/power risk in principle, even if they think they won’t or can’t do that.
Another crazy text in these crazy times. “We don’t know how to solve the alignment problem, but we’re going to race ahead anyway, because otherwise less cautious actors will win.” Which less cautious actors, Anthropic?
And also seconding Oliver’s question. What about power concentration, Anthropic? Your CEO has said literally this: “Anthropic has much more in common with the Department of War than we have differences.” Alignment to whom?
Aren’t xAI who fails to care about alignment and China who trains models to censor themselves perfect examples of such bad actors?
Additionally, Anthropic tried including the anti-power-concentration clause into Claude’s Constitution: “We’re especially concerned about the use of AI to help individual humans or small groups gain unprecedented and illegitimate forms of concentrated power. In order to avoid this, Claude should generally try to preserve functioning societal structures, democratic institutions, and human oversight mechanisms, and to avoid taking actions that would concentrate power inappropriately or undermine checks and balances.”
As for the impact on jobs, what about Anthropic’s CEO who explicitly said: “I do think in the long run AI will become so broadly effective and so cheap that this will no longer apply. At that point our current economic setup will no longer make sense, and there will be a need for a broader societal conversation about how the economy should be organized”?
Could you suggest what else Anthropic should’ve done to acknowledge those issues?
Well, racing to recursive self-improvement without solving alignment kills everyone. A company can’t justify that by pointing to other “bad actors”, at that point they’re a bad actor themselves.
And even before killing everyone, there’s other stuff that needs mentioning. Anthropic, like the other US labs afaik, has agreed that its models can be used by the US government for blanket surveillance of non-Americans. The US has a history of supporting nasty regimes abroad (see Operation Condor in South America) and sending them data to help political repression (see the Indonesia massacre).
They don’t think it kills everyone with high likelihood. The problem is that no one knows for sure how hard alignment is. There are no really convincing arguments on either side. It’s a wild guess and people are free to pick whatever their Motivated reasoning prefers.
But something convinced them to split from OpenAI and start a more alignment-focused lab in the first place. So there must’ve been convincing arguments (to them) then. The simplest explanation why they’re racing to RSI now is that they got closer to money and power.
Anthropic’s p(doom) is probably a lot lower than 1, but it still can be much higher than 0 while still justifying their current behavior. They only need to presuppose that they have better alignment techniques and care more about alignment than their competitors, and right now they have good reasons to think that way. So let’s assume that you’re a decision maker at Anthropic who do think you have better alignment and would like to decide whether to (unilaterally) keep racing or not.
If you think p(doom) approximates one, then the only way to play is to stop racing yourself, since you can’t make things any worse.
If you think p(doom) is zero, then of course you should race, but if the Anthropic people thought that way they wouldn’t have formed the company in the first place.
However, if p(doom) is some number in the middle (let’s say anywhere between 1-80%), and you think your own lab has better alignment than your competitors to a degree that outweighs the overall acceleration to the industry that you yourself bring to the world, then the best way to minimize p(doom) is to race and stay ahead of everyone else. If you don’t, your competitor is right behind and will catch up to you and then proceed to not care about alignment as much as you do, resulting in a higher p(doom).
Amodio seems to think there’s a chance of misalignment, and I’m betting he and the other the founders didn’t like or trust Altman. But they have auch lower p(doom) than you or I.
I do think getting closer to money and power has shifted their thinking through motivated reasoning; but so has thinking they might lead the world to AI success and be the greatest heroes in history.
We need better arguments if we hope to get the world to stop them and everyone else from developing AGI. Failing that, we need to help them align it. Those two projects are synergistic because they both involve understanding alignment and misalignment risks very clearly.
This thread is going in circles, let me restart cleanly.
If outright misalignment (AI straight up kills everyone) is averted, the next most likely outcome is alignment to power. This is what the question “alignment to whom” is about, see my first comment. For most people in the world, including me (a non-American, etc), this means permanent subjugation which is as bad as death or worse. Right now Anthropic’s answer to “alignment to whom” is unsatisfactory, see my second comment. So no, nobody should help them align it, unless they straighten the “alignment to whom” story.
Permanent subjugation is as bad as death? I most certainly agree that subjugation would be worse than death for some particularly sadistic and sociopathic leaders. I disagree, for about 99% of “subjugators”. You and I have discussed this before, and it’s a deeper disagreement than we’ll be able to resolve here.
It’s also urgent to resolve this question. AI safety strategy depends on it, as this conversation demonstrates. But arguing about it, or declaring it as though either you or I know the answer, is not going to help with this critical project.
I do hope to come back and write about this carefully. I hope to see you and others who share your stance refine the arguments and evidence for “the average leader, given unlimited power, will kill you or make your life miserable or worse”.
I don’t remember where our most complete thread on this is; I think we had a more complete discussion on some’s short and incomplete top-level post on this topic. We need better arguments on both sides so we can get a better guess.
Sometime ago I came up with a debate protocol that might be a good fit for this question, let me know if you’d like to try.
I’d love to! I like the format. The challenge is finding time. Timeboxing it would be great for efficiency. I wonder about posting it as one shared-authorship post where we each summarize our argument, written last, linked to two individually authored posts.
I’ll dm on timing and more logistics.
@Seth Herd @cousin_it I do remember a similar clash of positions of habryka and Villiam.
The case for the average leader making your life miserable is that the leader is able to seize the entirety of resources and not need the others for the vast majority of purposes.
A case against the leader making your life miserable would be something along the lines ‘The leader has a high enough integrity to propagate resources to one’s friends,[1] some of whom propagate the resources further,[2] the graph of propagations is likely connected to you’ or ‘The leader doesn’t gain anything by robbing the majority of people’.
The closest pre-ASI equivalents of an arrangement where the graph doesn’t connect the leader to the majority of people are:
Outlawed malpractices like explicit racial segregation;
Class barriers, like those described by conspiracy theories[3] or existing in capitalism as described by socialists;
States during some periods of decline.
The latter example is especially interesting because such periods did often emerge in human history and would end with the elites being purged after a leader understood their incompetence or the entire power structure being disrupted and eliminating the incompetent in a different manner. The AIs, on the other hand, would systematically prevent disruptions and fail to prevent power concentrations unless told to.
However, Claude’s current Constitution does have the line explicitly trying to prohibit it from concentrating power, which I quoted above. Does that imply that OpenAI and GDM, let alone xAI or Meta, are to be prevented from taking part in creation of the ASI?
P.S. How likely is it that the entire premise of ASIs perfectly aligned to any whims is false because the ASIs either arrive at a true morality or commit genocide?
Or to everyone, to people from an ethnos or everyone obeying certain rules.
See, e.g. footnote 56 from AI-2027′s Slowdown Branch. The AI-2027 authors are uncertain on whether a power grab happens at all.
For example, if an International Jewish conspiracy existed and the ASI was aligned to one of its members, then everyone not in the conspiracy would be doomed.