Interesting. Both of those posts have the form of “what would an agreement say” which I think is totally missing the hard part. So I think that points at the answer to your original question, and why others regard it as obvious and you do not.
The answer is “because there’s no political will”. And so the question isn’t what would an agreement say, it’s where would the political will come from.
My answer is that the political will will come from AI progress, particularly from visible job loss and from human-seeming AI systems, which will trip pattern-matching to strange humans, which we intuitively regard as quite dangerous. Xenophobia exists for a reason; strange humans have been among our biggest dangers since the start of evolution. I’ve written about this in A country of alien idiots in a datacenter: AI progress and public alarm and bits and pieces elsewhere.
On the positive side, this answer is that the political will will come, which shifts the question back to having an agreement or treaty ready to offer.
The problem with this answer is that the will might come too late. By the time systems are visibly taking jobs and acting agenticly and competently, we might already have a takeover-capable system in development, and it will be too late for anything as slow as international agreements. There still might be time for an executive order and informal power grabs in the face of adequate public (and politician) freakout. And that might be substantially useful, since only China is near the US, and they’d probably take a much more cautious approach to AGI and alignment (see China won’t win the AI race but would it be much worse if it did? and similar). I’ve written about this in Whether governments will control AGI is important and neglected and I now think the answer is just clearly yes, but maybe not in time.
That’s a slowdown not a pause, but it could perhaps be expanded to a pause if the discussion can move quickly between the US and China. I think this is possible. We’re really not enemies, just competitors. And our researchers are quite well-disposed toward each other.
Anyway, I think the major crux between us and the rest of the world is alignment risk. The average belief even among those that acknowledge the risk (which tbf is now pretty much anyone thinking about AGI) is maybe ten percent or lower. That’s enough to make some people want to pause, but not I think most of them, and not enough to make it a high priority.
So I think the clearest path toward pause (or slowdown) is clearer arguments about alignment risk.
Here I think overclaiming on the technical arguments has done grevious harm to the cause. Yudkowsky’s claim that misalignment is 99%+ likely has drawn much irritation and ire and attention. People, even sophisticated people, routinely argue “alignment is possible” instead of arguing about how possible. I think they’re quite correct that Yudkowsky’s technical argument is full of holes, but quite wrong in the implied leap from there to ~10% risk. Alignment may be quite achievable and still quite difficult to achieve on the current rushed path.
I think arguments for human incompetence on first tries and under pressure are a much better bet. To his credit, EY has shifted hard in this direction, and so have the handful of others making technical arguments for alignment difficulty. My arguments center on model uncertainty: nobody knows how hard alignment is; estimates from people with real time-on-task range from very low to very high; therefore the wisest assumption is that it could be extremely difficult and we are foolish to press ahead with so much unknown.
Here I think we could do vastly better. Optimists reason that current systems seem pretty aligned, so we’re probably on track to align more powerful systems. Pessimists argue that this isn’t useful evidence at all. Identifying cruxes and improving models of likely first AGI seems quite achievable, so that’s what I’m primarily working toward and asking others to engage in.
WRT the Anthropic office visits: This has the general form of “it’s their fault not ours” which is suspicious. In most disagreements, both parties blame the other. And even if it is totally their fault, I’d rather survive than assign blame. Usually the way forward in resolving interpersonal issues is “sorry about that, let’s try again” and then be nicer.
This is when you’re trying to reach mutual agreement with someone, not when you’re trying to negotiate a deal and have some leverage. Discussions about beliefs only resemble negotiations when the evidence is overwhelming. And on the dangers of alignment, it’s just unfortunately not.
Both of those posts have the form of “what would an agreement say” which I think is totally missing the hard part. So I think that points at the answer to your original question, and why others regard it as obvious and you do not.
You might be overinferring what I think these blog posts indicate? I’m just gesturing that I agree that the overall project of figuring out how the whole thing might be feasible is a worthy project.
The answer is “because there’s no political will”.
I know that this is a thing people say, and I agree there isn’t already automatically political will pre-gathered. But if the implication is that it would be an infeasible task to create and gather the political will for a global stop, that implication is one I strongly question! And so far I hear lots of signs pointing in the opposite direction, and grateful to the people working on that. I just wish that Anthropic would support those efforts.
WRT the Anthropic office visits: This has the general form of “it’s their fault not ours” which is suspicious.
Not blaming, describing. Can’t survive without describing.
(Anyway, just FYI, your time might be somewhat wasted if you want to get me on board with a particular approach / stance, because I’m much more commenting from the sidelines rather than an active participant; I’m focusing on other things, while others are actually working on communicating with the public and political leaders and so on.)
I’m also primarily occupied with other things. I’m spending some time on communication strategy and the logic of how opinion and policy could change, because it seems like it could be critically important, and not enough people seem to be thinking about it. As you note.
Interesting. Both of those posts have the form of “what would an agreement say” which I think is totally missing the hard part. So I think that points at the answer to your original question, and why others regard it as obvious and you do not.
The answer is “because there’s no political will”. And so the question isn’t what would an agreement say, it’s where would the political will come from.
My answer is that the political will will come from AI progress, particularly from visible job loss and from human-seeming AI systems, which will trip pattern-matching to strange humans, which we intuitively regard as quite dangerous. Xenophobia exists for a reason; strange humans have been among our biggest dangers since the start of evolution. I’ve written about this in A country of alien idiots in a datacenter: AI progress and public alarm and bits and pieces elsewhere.
On the positive side, this answer is that the political will will come, which shifts the question back to having an agreement or treaty ready to offer.
The problem with this answer is that the will might come too late. By the time systems are visibly taking jobs and acting agenticly and competently, we might already have a takeover-capable system in development, and it will be too late for anything as slow as international agreements. There still might be time for an executive order and informal power grabs in the face of adequate public (and politician) freakout. And that might be substantially useful, since only China is near the US, and they’d probably take a much more cautious approach to AGI and alignment (see China won’t win the AI race but would it be much worse if it did? and similar). I’ve written about this in Whether governments will control AGI is important and neglected and I now think the answer is just clearly yes, but maybe not in time.
That’s a slowdown not a pause, but it could perhaps be expanded to a pause if the discussion can move quickly between the US and China. I think this is possible. We’re really not enemies, just competitors. And our researchers are quite well-disposed toward each other.
Anyway, I think the major crux between us and the rest of the world is alignment risk. The average belief even among those that acknowledge the risk (which tbf is now pretty much anyone thinking about AGI) is maybe ten percent or lower. That’s enough to make some people want to pause, but not I think most of them, and not enough to make it a high priority.
So I think the clearest path toward pause (or slowdown) is clearer arguments about alignment risk.
Here I think overclaiming on the technical arguments has done grevious harm to the cause. Yudkowsky’s claim that misalignment is 99%+ likely has drawn much irritation and ire and attention. People, even sophisticated people, routinely argue “alignment is possible” instead of arguing about how possible. I think they’re quite correct that Yudkowsky’s technical argument is full of holes, but quite wrong in the implied leap from there to ~10% risk. Alignment may be quite achievable and still quite difficult to achieve on the current rushed path.
I think arguments for human incompetence on first tries and under pressure are a much better bet. To his credit, EY has shifted hard in this direction, and so have the handful of others making technical arguments for alignment difficulty. My arguments center on model uncertainty: nobody knows how hard alignment is; estimates from people with real time-on-task range from very low to very high; therefore the wisest assumption is that it could be extremely difficult and we are foolish to press ahead with so much unknown.
Here I think we could do vastly better. Optimists reason that current systems seem pretty aligned, so we’re probably on track to align more powerful systems. Pessimists argue that this isn’t useful evidence at all. Identifying cruxes and improving models of likely first AGI seems quite achievable, so that’s what I’m primarily working toward and asking others to engage in.
WRT the Anthropic office visits: This has the general form of “it’s their fault not ours” which is suspicious. In most disagreements, both parties blame the other. And even if it is totally their fault, I’d rather survive than assign blame. Usually the way forward in resolving interpersonal issues is “sorry about that, let’s try again” and then be nicer.
This is when you’re trying to reach mutual agreement with someone, not when you’re trying to negotiate a deal and have some leverage. Discussions about beliefs only resemble negotiations when the evidence is overwhelming. And on the dangers of alignment, it’s just unfortunately not.
You might be overinferring what I think these blog posts indicate? I’m just gesturing that I agree that the overall project of figuring out how the whole thing might be feasible is a worthy project.
I know that this is a thing people say, and I agree there isn’t already automatically political will pre-gathered. But if the implication is that it would be an infeasible task to create and gather the political will for a global stop, that implication is one I strongly question! And so far I hear lots of signs pointing in the opposite direction, and grateful to the people working on that. I just wish that Anthropic would support those efforts.
Not blaming, describing. Can’t survive without describing.
(Anyway, just FYI, your time might be somewhat wasted if you want to get me on board with a particular approach / stance, because I’m much more commenting from the sidelines rather than an active participant; I’m focusing on other things, while others are actually working on communicating with the public and political leaders and so on.)
That’s fine, I’ll consider it workshopping.
I’m also primarily occupied with other things. I’m spending some time on communication strategy and the logic of how opinion and policy could change, because it seems like it could be critically important, and not enough people seem to be thinking about it. As you note.
I hope you will too.