isn’t your squiggle model talking about whether racing is good, rather than whether unilaterally pausing is good?
Yes the model is more about racing than about pausing but I thought it was applicable here. My thinking was that there is a spectrum of development speed with “completely pause” on one end and “race as fast as possible” on the other. Pushing more toward the “pause” side of the spectrum has the ~opposite effect as pushing toward the “race” side.
I wish you’d try modeling this with more granularity than “is alignment hard” or whatever
I’ve never seen anyone else try to quantitatively model it. As far as I know, my model is the most granular quantitative model ever made. Which isn’t to say it’s particularly granular (I spent less than an hour on it) but this feels like an unfair criticism.
In general I am not a fan of criticisms of the form “this model is too simple”. All models are too simple. What, specifically, is wrong with it?
I had a quick look at the linked post and it seems to be making some implicit assumptions, such as
the plan of “use AI to make AI safe” has a ~100% chance of working (the post explicitly says this is false, but then proceeds as if it’s true)
there is a ~100% chance of slow takeoff
if you unilaterally pause, this doesn’t increase the probability that anyone else pauses, doesn’t make it easier to get regulations passed, etc.
I would like to see some quantification of the from “we think there is a 30% chance that we can bootstrap AI alignment using AI; a unilateral pause will only increase the probability of a global pause by 3 percentage points; and there’s only a 50% chance that the 2nd-leading company will attempt to align AI in a way we’d find satisfactory, therefore we think the least-risky plan is to stay at the front of the race and then bootstrap AI alignment.” (Or a more detailed version of that.)
Yes the model is more about racing than about pausing but I thought it was applicable here. My thinking was that there is a spectrum of development speed with “completely pause” on one end and “race as fast as possible” on the other. Pushing more toward the “pause” side of the spectrum has the ~opposite effect as pushing toward the “race” side.
I’ve never seen anyone else try to quantitatively model it. As far as I know, my model is the most granular quantitative model ever made. Which isn’t to say it’s particularly granular (I spent less than an hour on it) but this feels like an unfair criticism.
In general I am not a fan of criticisms of the form “this model is too simple”. All models are too simple. What, specifically, is wrong with it?
I had a quick look at the linked post and it seems to be making some implicit assumptions, such as
the plan of “use AI to make AI safe” has a ~100% chance of working (the post explicitly says this is false, but then proceeds as if it’s true)
there is a ~100% chance of slow takeoff
if you unilaterally pause, this doesn’t increase the probability that anyone else pauses, doesn’t make it easier to get regulations passed, etc.
I would like to see some quantification of the from “we think there is a 30% chance that we can bootstrap AI alignment using AI; a unilateral pause will only increase the probability of a global pause by 3 percentage points; and there’s only a 50% chance that the 2nd-leading company will attempt to align AI in a way we’d find satisfactory, therefore we think the least-risky plan is to stay at the front of the race and then bootstrap AI alignment.” (Or a more detailed version of that.)