I have very different intuitions about 50M GPUs for 1 week vs 200k GPUs for with 200 hours of work spread evenly across 50 years.
SlowCorp v1
SlowCorp v2
NormalCorp v1
NormalCorp v2
AutomatedCorp
Time to work on AI R&D
50 years
50 years
50 years
50 years
50 years
Number of AI researchers and engineers
800
800
4,000
4,000
200,000
Researcher/engineer quality
Median frontier AI company researcher/engineer
Median frontier AI company researcher/engineer
Similar to current frontier AI companies if they expanded rapidly
Similar to current frontier AI companies if they expanded rapidly
Level of world’s 100 best researchers/engineers
Time worked
One week of 24⁄7 work (or four weeks at 40h / week but the GPUs are paused while the workers aren’t working)
50 years of one 4 hour session per year
One year of 24⁄7 (or four years of 40h/week but the GPUs are paused while the workers aren’t working)
50 years of 40 hours / week for 1 month per year
50 years of 24⁄7
H100s
500,000,000
200,000
10,000,000
200,000
200,000
Cumulative H100-years
10 million
10 million
10 million
10 million
10 million
I think SlowCorp-v2 would get a lot more done than SlowCorp-v1 (though obviously still a lot less than AutomatedCorp). And also SlowCorp-v2 seems to be a closer analogy than SlowCorp-v1 - both corporations have the same amount of serial time, and my intuition is that you generally can’t make a training run go 10x faster just by throwing 10x as many GPUs at it, because you’ll be bottlenecked by IO.
And I know “SlowCorp is bottlenecked by IO” is not what the point of this intuition pump was supposed to be, but at least for me, it ended up being the main consideration pumping my intuition.
The way I set up the analogy makes it seem like AutomatedCorp has a serial compute advantage: because they have 50 years they can run things that take many serial years while NormalCorp can’t. As in, the exact analogy implies that they could use a tenth of their serial time to run a 5 year long training run on 50k H100s, while they could actually only do this if the run was sufficiently parallelizable such that it could be done on 2.5 million H100s in a tenth of a year. So, you should ignore any serial compute advantage. Similarly, you should ignore difficulties that SlowCorp might have in parallelizing things sufficiently etc.
You can also imagine that SlowCorp has 10 million magically good GPUs (and CPUs etc) which are like H100s but 50x serially faster (but still only has 1 week) while AutomatedCorp has 10 million much worse versions of H100s (and CPUs etc) which are 50x serially slower but otherwise the same (and has 50 years still).
Also SlowCorp has magically 50x better networking equipment than NormalCorp, and 50x higher rate limits on every site they’re trying to scrape, and 50x as much sensor data from any process in the world, and 50x faster shipping on any physical components they need, etc etc (and AutomatedCorp has magically 50x worse of all of those things).
But yeah, agreed that you should ignore all of those intuitions when considering the “1 week” scenario—I just found that I couldn’t actually turn all of those intuitions off when considering the scenario.
Yep, but my understanding is that the time associated with marginal scraping, sensor data, and physical components don’t matter much when talking about AI progress which is on the order of a year. Or honestly, maybe marginal improvements in these sorts of components don’t matter that much at all over this time scale (like freezing all these things for a year wouldn’t be much tax if you prepped in advance). Not super sure about situation with scrapping though.
I have very different intuitions about 50M GPUs for 1 week vs 200k GPUs for with 200 hours of work spread evenly across 50 years.
v1
v2
v1
v2
I think SlowCorp-v2 would get a lot more done than SlowCorp-v1 (though obviously still a lot less than AutomatedCorp). And also SlowCorp-v2 seems to be a closer analogy than SlowCorp-v1 - both corporations have the same amount of serial time, and my intuition is that you generally can’t make a training run go 10x faster just by throwing 10x as many GPUs at it, because you’ll be bottlenecked by IO.
And I know “SlowCorp is bottlenecked by IO” is not what the point of this intuition pump was supposed to be, but at least for me, it ended up being the main consideration pumping my intuition.
Yeah, I discuss this here:
You can also imagine that SlowCorp has 10 million magically good GPUs (and CPUs etc) which are like H100s but 50x serially faster (but still only has 1 week) while AutomatedCorp has 10 million much worse versions of H100s (and CPUs etc) which are 50x serially slower but otherwise the same (and has 50 years still).
Also SlowCorp has magically 50x better networking equipment than NormalCorp, and 50x higher rate limits on every site they’re trying to scrape, and 50x as much sensor data from any process in the world, and 50x faster shipping on any physical components they need, etc etc (and AutomatedCorp has magically 50x worse of all of those things).
But yeah, agreed that you should ignore all of those intuitions when considering the “1 week” scenario—I just found that I couldn’t actually turn all of those intuitions off when considering the scenario.
Yep, but my understanding is that the time associated with marginal scraping, sensor data, and physical components don’t matter much when talking about AI progress which is on the order of a year. Or honestly, maybe marginal improvements in these sorts of components don’t matter that much at all over this time scale (like freezing all these things for a year wouldn’t be much tax if you prepped in advance). Not super sure about situation with scrapping though.