Good questions!
Is there another parameter for the delay (after the commercial release) to produce the hundreds of thousands of chips and build a supercomputer using them?
There’s no additional parameter, but once the delay is over it still takes months or years before enough copies of the new chip is manufactured for the new chip to be a significant fraction of total global FLOP/s.
2) Do you think that in a scenario with quick large gains in hardware efficiency, the delay for building a new chip fab could be significantly larger than the current estimate because of the need to also build new factories for the machines that will be used in the new chip fab? (e.g. ASMI could also need to build factories, not just TSMC)
I agree with that. The 1 year delay was averaging across improvements that do and don’t require new fabs to be built.
3) Do you think that these parameters/adjustments would significantly change the relative impact on the takeoff of the “hardware overhang” when compared to the “software overhang”? (e.g. maybe making hardware overhang even less important for the speed of the takeoff)
Yep, additional delays would raise the relative importance of software compared to hardware.
Hi Trent!
I think the review makes a lot of good points and am glad you wrote it.
Here are some hastily-written responses, focusing on areas of disagreement:
it is possible that AI generated synthetic data will ultimately be higher quality than random Internet text. Still I agree directionally about the data.
it seems possible to me that abstraction comes with scale. A lot of the problems you describe get much less bad with scale. And it seems on abstract level that understanding causality deeply is useful for predicting the next word on text that you have not seen before, as models must do during training. Still, I agree that algorithmic innovations, for example relating to memory, maybe needed to get to full automation and that could delay things significantly.
I strongly agree that my GDP assumptions are aggressive and unrealistic. I’m not sure that quantitatively it matters that much. You are of course, right about all of the feedback loops. I don’t think that GDP being higher overall matters very much compared to the fraction of GDP invested. I think it will depend on whether people are willing to invest large fractions of GDP for the potential impact, or whether they need to see the impact there and then. If the delays you mentioned delay wake up then that will make a big difference, otherwise I think the difference is small.
You may be right about the parallelization penalty. But I will share some context about that parameter that I think reduces the force of your argument. When I chose the parameters for the rate of increased investment, I was often thinking about how quickly you could in practice increase the size of the community of people working on the problem. That means that I was not accounting for the fact that the average salary rises when spending in an area rises. That salary rise will create the appearance of a large parallelization penalty. Another factor is that one contributor to the parallelization penalty is that the average quality of the researcher decreases over time with the side of the field. But when AI labor floods in, it’s average quality will not decrease as the quantity increases. And so the parallelization penalty for AI will be lower. But perhaps my penalty is still too small. One final point. If indeed the penalty should be very low then AGI will increase output by a huge amount. You can run fewer copies much faster in serial time. If there is a large parallelization penalty, then the benefit of running fewer copies faster will be massive. So a large parallel penalty would increase the boost just as you get AGI I believe.