Our government is determined to lose the AI race in the name of winning the AI race.
The least we can do, if prioritizing winning the race, is to try and actually win it.
This is a bizarre pair of claims to make. But I think it illustrates a surprisingly common mistake from the AI safety community, which I call “jumping down the slippery slope”. More on this in a forthcoming blog post, but the key idea is that when you look at a situation from a high level of abstraction, it often seems like sliding down a slippery slope towards a bad equilibrium is inevitable. From that perspective, the sort of people who think in terms of high-level abstractions feel almost offended when people don’t slide down that slope. On a psychological level, the short-term benefit of “I get to tell them that my analysis is more correct than theirs” outweighs the long-term benefit of “people aren’t sliding down the slippery slope”.
One situation where I sometimes get this feeling is when a shopkeeper charges less than the market rate, because they want to be kind to their customers. This is typically a redistribution of money from a wealthier person to less wealthy people; and either way it’s a virtuous thing to do. But I sometimes actually get annoyed at them, and itch to smugly say “listen, you dumbass, you just don’t understand economics”. It’s like a part of me thinks of reaching the equilibrium as a goal in itself, whether or not we actually like the equilibrium.
This is obviously a much worse thing to do in AI safety. Relevant examples include Situational Awareness and safety-motivated capability evaluations (e.g. “building great capabilities evals is a thing the labs should obviously do, so our work on it isn’t harmful”). It feels like Zvi is doing this here too. Why is trying to actually win it the least we can do? Isn’t this exactly the opposite of what would promote crucial international cooperation on AI? Is it really so annoying when your opponents are shooting themselves in the foot that it’s worth advocating for them to stop doing that?
It kinda feels like the old joke:
On a beautiful Sunday afternoon in the midst of the French Revolution the revolting citizens led a priest, a drunkard and an engineer to the guillotine. They ask the priest if he wants to face up or down when he meets his fate. The priest says he would like to face up so he will be looking towards heaven when he dies. They raise the blade of the guillotine and release it. It comes speeding down and suddenly stops just inches from his neck. The authorities take this as divine intervention and release the priest.
The drunkard comes to the guillotine next. He also decides to die face up, hoping that he will be as fortunate as the priest. They raise the blade of the guillotine and release it. It comes speeding down and suddenly stops just inches from his neck. Again, the authorities take this as a sign of divine intervention, and they release the drunkard as well.
Next is the engineer. He, too, decides to die facing up. As they slowly raise the blade of the guillotine, the engineer suddenly says, “Hey, I see what your problem is …”
Zvi is arguing “X implies Y” here. Zvi happens to believe Y but disbelieve X; however, he is writing to people who think “X and not-Y”, in order to nudge them to support Y.
Here X = it is good for the US to build superintelligence fast, before China does, and Y = we should have some diffusion rules making it harder for China to catch up to the USA.
Zvi believes Z = nobody should be building superintelligence soon, and believes Z implies Y, but it is useful to show that X implies Y as well.
This is a bizarre pair of claims to make. But I think it illustrates a surprisingly common mistake from the AI safety community, which I call “jumping down the slippery slope”. More on this in a forthcoming blog post, but the key idea is that when you look at a situation from a high level of abstraction, it often seems like sliding down a slippery slope towards a bad equilibrium is inevitable. From that perspective, the sort of people who think in terms of high-level abstractions feel almost offended when people don’t slide down that slope. On a psychological level, the short-term benefit of “I get to tell them that my analysis is more correct than theirs” outweighs the long-term benefit of “people aren’t sliding down the slippery slope”.
One situation where I sometimes get this feeling is when a shopkeeper charges less than the market rate, because they want to be kind to their customers. This is typically a redistribution of money from a wealthier person to less wealthy people; and either way it’s a virtuous thing to do. But I sometimes actually get annoyed at them, and itch to smugly say “listen, you dumbass, you just don’t understand economics”. It’s like a part of me thinks of reaching the equilibrium as a goal in itself, whether or not we actually like the equilibrium.
This is obviously a much worse thing to do in AI safety. Relevant examples include Situational Awareness and safety-motivated capability evaluations (e.g. “building great capabilities evals is a thing the labs should obviously do, so our work on it isn’t harmful”). It feels like Zvi is doing this here too. Why is trying to actually win it the least we can do? Isn’t this exactly the opposite of what would promote crucial international cooperation on AI? Is it really so annoying when your opponents are shooting themselves in the foot that it’s worth advocating for them to stop doing that?
It kinda feels like the old joke:
Zvi is arguing “X implies Y” here. Zvi happens to believe Y but disbelieve X; however, he is writing to people who think “X and not-Y”, in order to nudge them to support Y.
Here X = it is good for the US to build superintelligence fast, before China does, and Y = we should have some diffusion rules making it harder for China to catch up to the USA.
Zvi believes Z = nobody should be building superintelligence soon, and believes Z implies Y, but it is useful to show that X implies Y as well.