Coalition members would each consolidate AI chips into a smaller number of declared data centers, where their usage can be declared and monitored for assurance that they are only being used for allowed activities. The number of chips permitted in any one unmonitored facility would be strictly limited. (We suggest the equivalent of 16 H100 chips, a conglomeration that would cost approximately $500,000 USD in 2025).
This seems a bit too low to me. According to epoch AI, the largest training runs currently take 5e26 FLOP, and so at 50% utilization this cluster would take[1] ~ 5e26/(0.5*16*1e15) s = 6.25e10 s = 1981 years, far longer than the expected lifetime of the ban. Since current training runs have not yet caused a catastrophe, we should allow runs with less effective compute. For example, if want it to take ~50 years to reach frontier LLMs, and conservatively assuming a 10x algorithmic improvement over existing frontier methods, we should allow ~64 H100 GPU clusters.
Thanks for your comment! Conveniently, we wrote this post about why we pick the training compute thresholds we did in the agreement (1e24 is the max). I expect you will find it interesting, as it responds to some of what you’re saying! The difference between 1e24 and 5e26 largely explains our difference in conclusions about what a reasonable unmonitored cluster size should be, I think.
You’re asking the right question here, and it’s one we discuss in the report a little (e.g., p. 28, 54). One small note on the math is that I think it’s probably better to use FP8 (so 2e15 theoretical FLOP per H100 due to the emergence of FP8 training).
This seems a bit too low to me. According to epoch AI, the largest training runs currently take 5e26 FLOP, and so at 50% utilization this cluster would take[1] ~ 5e26/(0.5*16*1e15) s = 6.25e10 s = 1981 years, far longer than the expected lifetime of the ban. Since current training runs have not yet caused a catastrophe, we should allow runs with less effective compute. For example, if want it to take ~50 years to reach frontier LLMs, and conservatively assuming a 10x algorithmic improvement over existing frontier methods, we should allow ~64 H100 GPU clusters.
Not to mention that frontier training runs would be impossible on this cluster due to memory constraints.
Thanks for your comment! Conveniently, we wrote this post about why we pick the training compute thresholds we did in the agreement (1e24 is the max). I expect you will find it interesting, as it responds to some of what you’re saying! The difference between 1e24 and 5e26 largely explains our difference in conclusions about what a reasonable unmonitored cluster size should be, I think.
You’re asking the right question here, and it’s one we discuss in the report a little (e.g., p. 28, 54). One small note on the math is that I think it’s probably better to use FP8 (so 2e15 theoretical FLOP per H100 due to the emergence of FP8 training).
We’ve had more than a 10x algorithmic improvement over the last 50 years.