To be clear, I think that basically any architecture is technically sufficient if you scale it up enough. Take ChatGPT, make it enormous, through oceans of data at it, and then allow it to store gigabytes of linguistic information. This is eventually a recipe for superintelligent AI if you scale it up enough. My intuition so far is that we basically haven’t made any progress when it comes to deep thinking though, and as soon as LLMs start to deviate from learned patterns/heuristics, they hit a wall and become as smart as a ~shrimp. So I estimate the scale required to actually get anywhere with the current architecture is just too high.
I think new architectures are needed. I don’t think it will be as straightforward as “just use recurrence/neuralese”, though moving away from the limitations of LLMs will be a necessary step. I think I’m going to write a follow-up blog post clarifying some of the limitations of the current architecture and why I think the problem is really hard, not just a straightforward matter of scaling up. I think it’ll look something like:
Each deep problem is its own exponential space, and exploring exponential spaces is very computationally expensive. We don’t do that when running LLMs for a single-pass. We barely do it when running with chain of thought or whatever. We only do it when training, and training is computationally very expensive, because exploring exponential spaces is very computationally expensive. We should expect an AI that can generically solve deep problems will be very computationally expensive to run, let alone train. There isn’t a cheap, general-purpose strategy for solving exponential problems, so you can’t re-use progress from one to help with another necessarily. An AI that solves a new exponential problem will have to do the same kind of deep thinking AlphaGo Zero did in training when it played many games against itself, learning patterns and heuristics in the process. And that was a best-case, because you can simulate games of Go, but most problems we want to solve are not simulate-able, so you have to explore exponential space in a much slower, more expensive way.
And btw I think LLMs currently mostly leverage existing insights/heuristics present in the training data. I don’t think it’s bringing much insight on it’s own right now, even during training. But that’s just my gut feel.
I think we can eventually make the breakthroughs necessary and get to the scale necessary for this to work, but I don’t see it happening in five years or whatever.
To be clear, I think that basically any architecture is technically sufficient if you scale it up enough. Take ChatGPT, make it enormous, through oceans of data at it, and then allow it to store gigabytes of linguistic information. This is eventually a recipe for superintelligent AI if you scale it up enough. My intuition so far is that we basically haven’t made any progress when it comes to deep thinking though, and as soon as LLMs start to deviate from learned patterns/heuristics, they hit a wall and become as smart as a ~shrimp. So I estimate the scale required to actually get anywhere with the current architecture is just too high.
I think new architectures are needed. I don’t think it will be as straightforward as “just use recurrence/neuralese”, though moving away from the limitations of LLMs will be a necessary step. I think I’m going to write a follow-up blog post clarifying some of the limitations of the current architecture and why I think the problem is really hard, not just a straightforward matter of scaling up. I think it’ll look something like:
Each deep problem is its own exponential space, and exploring exponential spaces is very computationally expensive. We don’t do that when running LLMs for a single-pass. We barely do it when running with chain of thought or whatever. We only do it when training, and training is computationally very expensive, because exploring exponential spaces is very computationally expensive. We should expect an AI that can generically solve deep problems will be very computationally expensive to run, let alone train. There isn’t a cheap, general-purpose strategy for solving exponential problems, so you can’t re-use progress from one to help with another necessarily. An AI that solves a new exponential problem will have to do the same kind of deep thinking AlphaGo Zero did in training when it played many games against itself, learning patterns and heuristics in the process. And that was a best-case, because you can simulate games of Go, but most problems we want to solve are not simulate-able, so you have to explore exponential space in a much slower, more expensive way.
And btw I think LLMs currently mostly leverage existing insights/heuristics present in the training data. I don’t think it’s bringing much insight on it’s own right now, even during training. But that’s just my gut feel.
I think we can eventually make the breakthroughs necessary and get to the scale necessary for this to work, but I don’t see it happening in five years or whatever.