I’m leaving the same comment here and in reply to daniel on my blog.
First, thank you for engaging in good faith and rewarding deep critique. Hopefully this dialogue will help people understand the disagreements over AI development and modelling better, so they can make their own judgements.
I think I’ll hold off on replying to most of the points there, and make my judgement after Eli does an in-depth writeup of the new model. However, I did see that there was more argumentation over the superexponential curve, so I’ll try out some more critiques here: not as confident about these, but hopefully it sparks discussion.
The impressive achievements in LLM capabilities since GPT-2 have been driven by many factors, such as drastically increased compute, drastically increased training data, algorithmic innovations such as chain-of-thought, increases in AI workforce, etc. The extent that each contributes is a matter of debate, which we can save for when you properly write up your new model.
Now, let’s look for a second at what happens when the curve goes extreme: using median parameters and starting the superexponentional today, the time horizon of AI would improve from one-thousand work-years to ten-thousand work-years In around five weeks. So you release a model, and it scores 80% on 1000 work year tasks, but only like 40% on 10,000 work year tasks (the current ratio of 50% to 80% time horizons is like 4:1 or so). Then five weeks later you release a new model, and now the reliability on the much harder tasks has doubled to 80%.
Why? What causes the reliability to shoot up in five weeks? The change in the amount of available compute, reference data, or labor force will not be significant in that time, and algorithmic breakthroughs do not come with regularity. It can’t be due to any algorithmic speedups from AI development because that’s in a different part of the model: we’re talking about three weeks of normal AI development, like it’s being done by openAI as it currently stands.. If the AI is only 30x faster than humans, then the time required for the AI to do the thousand year task is 33 years! So where does this come from? Will we have developed the perfect algorithm, such that AI no longer needs retraining?
I think a mistake could be made in trying to transfer intuition about humans to AI here: perhaps the intuition is “hey, a human who is good enough to do a 1 year task well can probably be trusted to do a 10 year task”.
However, if a human is trying to reliably do a “100 year” task (a task that would take a team of a hundred about a year to do), this might involve spending several years getting an extra degree in the subject, read a ton of literature, improving their productivity, get mentored by an expert in the subject, etc. While they work on it, they learn new stuff and their actual neurons get rewired.
But the AI equivalent to this would be getting new algorithms, new data, new computing power, new training. ie, becoming an entirely new model, which would take significantly more than a few weeks to be built. I think there may be some double counting going on between this superexp and the superexp from algo speedups.
I’m leaving the same comment here and in reply to daniel on my blog.
First, thank you for engaging in good faith and rewarding deep critique. Hopefully this dialogue will help people understand the disagreements over AI development and modelling better, so they can make their own judgements.
I think I’ll hold off on replying to most of the points there, and make my judgement after Eli does an in-depth writeup of the new model. However, I did see that there was more argumentation over the superexponential curve, so I’ll try out some more critiques here: not as confident about these, but hopefully it sparks discussion.
The impressive achievements in LLM capabilities since GPT-2 have been driven by many factors, such as drastically increased compute, drastically increased training data, algorithmic innovations such as chain-of-thought, increases in AI workforce, etc. The extent that each contributes is a matter of debate, which we can save for when you properly write up your new model.
Now, let’s look for a second at what happens when the curve goes extreme: using median parameters and starting the superexponentional today, the time horizon of AI would improve from one-thousand work-years to ten-thousand work-years In around five weeks. So you release a model, and it scores 80% on 1000 work year tasks, but only like 40% on 10,000 work year tasks (the current ratio of 50% to 80% time horizons is like 4:1 or so). Then five weeks later you release a new model, and now the reliability on the much harder tasks has doubled to 80%.
Why? What causes the reliability to shoot up in five weeks? The change in the amount of available compute, reference data, or labor force will not be significant in that time, and algorithmic breakthroughs do not come with regularity. It can’t be due to any algorithmic speedups from AI development because that’s in a different part of the model: we’re talking about three weeks of normal AI development, like it’s being done by openAI as it currently stands.. If the AI is only 30x faster than humans, then the time required for the AI to do the thousand year task is 33 years! So where does this come from? Will we have developed the perfect algorithm, such that AI no longer needs retraining?
I think a mistake could be made in trying to transfer intuition about humans to AI here: perhaps the intuition is “hey, a human who is good enough to do a 1 year task well can probably be trusted to do a 10 year task”.
However, if a human is trying to reliably do a “100 year” task (a task that would take a team of a hundred about a year to do), this might involve spending several years getting an extra degree in the subject, read a ton of literature, improving their productivity, get mentored by an expert in the subject, etc. While they work on it, they learn new stuff and their actual neurons get rewired.
But the AI equivalent to this would be getting new algorithms, new data, new computing power, new training. ie, becoming an entirely new model, which would take significantly more than a few weeks to be built. I think there may be some double counting going on between this superexp and the superexp from algo speedups.