Reporting requirements for foundation models, triggered by highly reasonable compute thresholds.
I disagree, and I think the computing threshold is unreasonably high. I don’t even mean this in a “it is unreasonable because an adequate civilization would do way better”– I currently mean it in a “I think our actual civilization, even with all of its flaws, could have expected better.”
There are very few companies training models above 10^20 FLOP, and it seems like it would be relatively easy to simply say “hey, we are doing this training run and here are some safety measures we are using.”
I understand that people are worried about overregulation and stifling innovation in unnecessary ways. But these are reporting requirements– all they do is require someone to inform the government that they are engaging in a training run.
Many people think that 10^26 FLOP has a non-trivial chance of creating xrisk-capable AGI in the next 3-5 years (especially as algorithms get better). But that’s not even the main crux for me– the main crux is that reporting requirements seem so low-cost relative to the benefit of the government being able to know what’s going on, track risks, and simply have access to information that could help it know what to do.
It also seems very likely to me that the public the. media would be on the side of a lower threshold. If frontier AI companies complained, I think it’s pretty straightforward to just be like “wait… you’re developing technology that many of you admit could cause extinction, and you don’t even want to tell the government what you’re up to?”
With all that said, I’m glad the EO uses a compute threshold in the first place (we could’ve gotten something that didn’t even acknowledge compute as a useful metric).
But I think 10^26 is extremely high for a reporting requirement, and I strongly hope that the threshold is lowered.
I think part 2 that details the reactions will provide important color here—if this had impacted those other than the major labs right away, I believe the reaction would have been quite bad, and that setting it substantially lower would have been a strategic error and also a very hard sell to the Biden Administration. But perhaps I am wrong about that. They do reserve the ability to change the threshold in the future.
My guess is that the threshold is a precursor to more stringent regulation on people above the bar, and that it’s easier to draw a line in the sand now and stick to it. I feel pretty fine with it being so high
Thank you for this, Zvi!
I disagree, and I think the computing threshold is unreasonably high. I don’t even mean this in a “it is unreasonable because an adequate civilization would do way better”– I currently mean it in a “I think our actual civilization, even with all of its flaws, could have expected better.”
There are very few companies training models above 10^20 FLOP, and it seems like it would be relatively easy to simply say “hey, we are doing this training run and here are some safety measures we are using.”
I understand that people are worried about overregulation and stifling innovation in unnecessary ways. But these are reporting requirements– all they do is require someone to inform the government that they are engaging in a training run.
Many people think that 10^26 FLOP has a non-trivial chance of creating xrisk-capable AGI in the next 3-5 years (especially as algorithms get better). But that’s not even the main crux for me– the main crux is that reporting requirements seem so low-cost relative to the benefit of the government being able to know what’s going on, track risks, and simply have access to information that could help it know what to do.
It also seems very likely to me that the public the. media would be on the side of a lower threshold. If frontier AI companies complained, I think it’s pretty straightforward to just be like “wait… you’re developing technology that many of you admit could cause extinction, and you don’t even want to tell the government what you’re up to?”
With all that said, I’m glad the EO uses a compute threshold in the first place (we could’ve gotten something that didn’t even acknowledge compute as a useful metric).
But I think 10^26 is extremely high for a reporting requirement, and I strongly hope that the threshold is lowered.
I think part 2 that details the reactions will provide important color here—if this had impacted those other than the major labs right away, I believe the reaction would have been quite bad, and that setting it substantially lower would have been a strategic error and also a very hard sell to the Biden Administration. But perhaps I am wrong about that. They do reserve the ability to change the threshold in the future.
My guess is that the threshold is a precursor to more stringent regulation on people above the bar, and that it’s easier to draw a line in the sand now and stick to it. I feel pretty fine with it being so high