Do technologies that have lots of resources put into their development generally improve discontinuously or by huge slope changes?
Do technologies often get displaced by technologies with a different lineage?
I agree with your position on (2) here. But it seems like the claim in the post that sometime in the 2030s someone will make a single important architectural innovation that leads to takeover within a year mostly depends on (1), as it would require progress within that year to be comparable to all the progress from now until that year. Also you said the architectural innovation might be a slight tweak to the LLM architecture, which would mean it shares the same lineage.
The history of machine learning seems pretty continuous wrt advance prediction. In the Epoch graph, the line fit on loss of the best LSTM up to 2016 sees a slope change of less than 2x, whereas a hypothetical innovation that causes takeover within a year with not much progress in the intervening 8 years would be ~8x. So it seems more likely to me (conditional on 2033 timelines and a big innovation) that we get some architectural innovation which has a moderately different lineage in 2027, it overtakes transformers’ performance in 2029, and afterward causes the rate of AI improvement to increase by something like 1.5x-2x.
2 out of 3 of the technologies you listed probably have continuous improvement despite the lineage change
1910-era cars were only a little better than horses, and the overall speed someone could travel long distances in the US probably increased in slope by <2x after cars due to things like road quality improvement before cars and improvements in ships and rail (though maybe railroads were a discontinuity, not sure)
Before refrigerators we had low-quality refrigerators that would contaminate the ice with ammonia, and before that people shipped ice from Maine, so I would expect the cost/quality of refrigeration to have much less than an 8x slope change at the advent of mechanical refrigeration
Indeed, and I’m glad we’ve converged on (2). But...
Do technologies that have lots of resources put into their development generally improve discontinuously or by huge slope changes?
… On second thoughts, how did we get there? The initial disagreement was how plausible it was for incremental changes to the LLM architecture to transform it into a qualitatively different type of architecture. It’s not about continuity-in-performance, it’s about continuity-in-design-space.
Whether finding an AGI-complete architecture would lead to a discontinuous advancement in capabilities, to FOOM/RSI/sharp left turn, is a completely different topic from how smoothly we should expect AI architectures’ designs to change. And on that topic, (a) I’m not very interested in reference-class comparisons as opposed to direct gears-level modeling of this specific problem, (b) this is a bottomless rabbit hole/long-standing disagreement which I’m not interested in going into at this time.
2 out of 3 of the technologies you listed probably have continuous improvement despite the lineage change
That’s an interesting general pattern, if it checks out. Any guesses why that might be the case?
My instinctive guess is the new-paradigm approaches tend to start out promising-in-theory, but initially very bad, people then tinker with prototypes, and the technology becomes commercially viable the moment it’s at least marginally better than the previous-paradigm SOTA. Which is why there’s an apparent performance-continuity despite a lineage/paradigm-discontinuity.
I think we have two separate claims here:
Do technologies that have lots of resources put into their development generally improve discontinuously or by huge slope changes?
Do technologies often get displaced by technologies with a different lineage?
I agree with your position on (2) here. But it seems like the claim in the post that sometime in the 2030s someone will make a single important architectural innovation that leads to takeover within a year mostly depends on (1), as it would require progress within that year to be comparable to all the progress from now until that year. Also you said the architectural innovation might be a slight tweak to the LLM architecture, which would mean it shares the same lineage.
The history of machine learning seems pretty continuous wrt advance prediction. In the Epoch graph, the line fit on loss of the best LSTM up to 2016 sees a slope change of less than 2x, whereas a hypothetical innovation that causes takeover within a year with not much progress in the intervening 8 years would be ~8x. So it seems more likely to me (conditional on 2033 timelines and a big innovation) that we get some architectural innovation which has a moderately different lineage in 2027, it overtakes transformers’ performance in 2029, and afterward causes the rate of AI improvement to increase by something like 1.5x-2x.
2 out of 3 of the technologies you listed probably have continuous improvement despite the lineage change
1910-era cars were only a little better than horses, and the overall speed someone could travel long distances in the US probably increased in slope by <2x after cars due to things like road quality improvement before cars and improvements in ships and rail (though maybe railroads were a discontinuity, not sure)
Before refrigerators we had low-quality refrigerators that would contaminate the ice with ammonia, and before that people shipped ice from Maine, so I would expect the cost/quality of refrigeration to have much less than an 8x slope change at the advent of mechanical refrigeration
Only rockets were actually a discontinuity
Tell me if you disagree.
Indeed, and I’m glad we’ve converged on (2). But...
… On second thoughts, how did we get there? The initial disagreement was how plausible it was for incremental changes to the LLM architecture to transform it into a qualitatively different type of architecture. It’s not about continuity-in-performance, it’s about continuity-in-design-space.
Whether finding an AGI-complete architecture would lead to a discontinuous advancement in capabilities, to FOOM/RSI/sharp left turn, is a completely different topic from how smoothly we should expect AI architectures’ designs to change. And on that topic, (a) I’m not very interested in reference-class comparisons as opposed to direct gears-level modeling of this specific problem, (b) this is a bottomless rabbit hole/long-standing disagreement which I’m not interested in going into at this time.
That’s an interesting general pattern, if it checks out. Any guesses why that might be the case?
My instinctive guess is the new-paradigm approaches tend to start out promising-in-theory, but initially very bad, people then tinker with prototypes, and the technology becomes commercially viable the moment it’s at least marginally better than the previous-paradigm SOTA. Which is why there’s an apparent performance-continuity despite a lineage/paradigm-discontinuity.