Thanks! Here’s a partial response, as I mull it over.
Also, I’d note that the brain seems way more complex than LLMs to me!
See “Brain complexity is easy to overstate” section here.
basically all paradigms allow for mixing imitation with reinforcement learning
As in the §2.3.2, if an LLM sees output X in context Y during pretraining, it will automatically start outputting X in context Y. Whereas if smart human Alice hears Bob say X in context Y, Alice will not necessarily start saying X in context Y. Instead she might say “Huh? Wtf are you talking about Bob?”
Let’s imagine installing an imitation learning module in Alice’s brain that makes her reflexively say X in context Y upon hearing Bob say it. I think I’d expect that module to hinder her learning and understanding, not accelerate it, right?
(If Alice is able says to herself “in this situation, Bob would say X”, then she has a shoulder-Bob, and that’s definitely a benefit not a cost. But that’s predictive learning, not imitative learning. No question that predictive learning is helpful. That’s not what I’m talking about.)
…So there’s my intuitive argument that the next paradigm would be hindered rather than helped by mixing in some imitative learning. (Or I guess more precisely, as long as imitative learning is part of the mix, I expect the result to be no better than LLMs, and probably worse. And as long as we’re in “no better than LLM” territory, I’m off the hook, because I’m only making a claim that there will be little R&D between “doing impressive things that LLMs can’t do” and ASI, not between zero and ASI.)
Noteably, in the domains of chess and go it actually took many years to make it through the human range. And, it was possible to leverage imitation learning and human heuristics to perform quite well at Go (and chess) in practice, up to systems which weren’t that much worse than humans.
In my mind, the (imperfect!) analogy here would be (LLMs, new paradigm) ↔ (previous Go engines, AlphaGo and successors).
In particular, LLMs today are in many (not all!) respects “in the human range” and “perform quite well” and “aren’t that much worse than humans”.
algorithmic progress
I started writing a reply to this part … but first I’m actually kinda curious what “algorithmic progress” has looked like for LLMs, concretely—I mean, the part where people can now get the same results from less compute. Like what are the specific things that people are doing differently today than in 2019? Is there a list somewhere? A paper I could read? (Or is it all proprietary?) (Epoch talks about how much improvement has happened, but not what the improvement consists of.) Thanks in advance.
See “Brain complexity is easy to overstate” section here.
Sure, but I still think it’s probably more way more complex than LLMs even if we’re just looking at the parts key for AGI performance (in particular, the parts which learn from scratch). And, my guess would be that performance is substantially greatly degraded if you only take only as much complexity as the core LLM learning algorithm.
Let’s imagine installing an imitation learning module in Alice’s brain that makes her reflexively say X in context Y upon hearing Bob say it. I think I’d expect that module to hinder her learning and understanding, not accelerate it, right?
This isn’t really what I’m imagining, nor do I think this is how LLMs work in many cases. In particular, LLMs can transfer from training on random github repos to being better in all kinds of different contexts. I think humans can do something similar, but have much worse memory.
I think in the case of humans and LLMs, this is substantially subconcious/non-explicit, so I don’t think this is well described as having a shoulder Bob.
Also, I would say that humans do learn from imitation! (You can call it prediction, but it doesn’t matter what you call it as long as it implies that data from humans makes things scale more continuously through the human ragne.) I just think that you can do better at this than humans based on the LLM case, mostly because humans aren’t exposed to as much data.
Also, I think the question is “can you somehow make use of imitation data” not “can the brain learning algorithm immediately use of imitation”?
In my mind, the (imperfect!) analogy here would be (LLMs, new paradigm) ↔ (previous Go engines, AlphaGo and successors).
Notably this analogy implies LLMs will be able to automate substantial fractions of human work prior to a new paradigm which (over the course of a year or two and using vast computational resources) beats the best humans. This is very different from the “brain in a basement” model IMO. I get that you think the analogy is imperfect (and I agree), but it seems worth noting that the analogy you’re drawing suggests something very different from what you expect to happen.
Is there a list somewhere? A paper I could read? (Or is it all proprietary?)
It’s substantially proprietary, but you could consider looking at the Deepseek V3 paper. We don’t actually have great understanding of the quantity and nature of algorithmic improvment after GPT-3. It would be useful for someone to do a more up to date review based on the best available evidence.
I’m not sure that complexity is protecting us. On the one hand, there are just 1MB of bases coding for the brain (and less for the connectome), but that doesn’t mean we can read it and it may take a long time to reverse engineer.
On the other hand, our existing systems of LLMs are already much more complex than that. Likely more than a GB of source code for modern LLM-running compute center servers. And here the relationship between the code and the result is better known and can be iterated on much faster. We may not need to reverse engineer the brain. Experimentation may be sufficient.
Thanks! Here’s a partial response, as I mull it over.
See “Brain complexity is easy to overstate” section here.
As in the §2.3.2, if an LLM sees output X in context Y during pretraining, it will automatically start outputting X in context Y. Whereas if smart human Alice hears Bob say X in context Y, Alice will not necessarily start saying X in context Y. Instead she might say “Huh? Wtf are you talking about Bob?”
Let’s imagine installing an imitation learning module in Alice’s brain that makes her reflexively say X in context Y upon hearing Bob say it. I think I’d expect that module to hinder her learning and understanding, not accelerate it, right?
(If Alice is able says to herself “in this situation, Bob would say X”, then she has a shoulder-Bob, and that’s definitely a benefit not a cost. But that’s predictive learning, not imitative learning. No question that predictive learning is helpful. That’s not what I’m talking about.)
…So there’s my intuitive argument that the next paradigm would be hindered rather than helped by mixing in some imitative learning. (Or I guess more precisely, as long as imitative learning is part of the mix, I expect the result to be no better than LLMs, and probably worse. And as long as we’re in “no better than LLM” territory, I’m off the hook, because I’m only making a claim that there will be little R&D between “doing impressive things that LLMs can’t do” and ASI, not between zero and ASI.)
In my mind, the (imperfect!) analogy here would be (LLMs, new paradigm) ↔ (previous Go engines, AlphaGo and successors).
In particular, LLMs today are in many (not all!) respects “in the human range” and “perform quite well” and “aren’t that much worse than humans”.
I started writing a reply to this part … but first I’m actually kinda curious what “algorithmic progress” has looked like for LLMs, concretely—I mean, the part where people can now get the same results from less compute. Like what are the specific things that people are doing differently today than in 2019? Is there a list somewhere? A paper I could read? (Or is it all proprietary?) (Epoch talks about how much improvement has happened, but not what the improvement consists of.) Thanks in advance.
Sure, but I still think it’s probably more way more complex than LLMs even if we’re just looking at the parts key for AGI performance (in particular, the parts which learn from scratch). And, my guess would be that performance is
substantiallygreatly degraded if you only take only as much complexity as the core LLM learning algorithm.This isn’t really what I’m imagining, nor do I think this is how LLMs work in many cases. In particular, LLMs can transfer from training on random github repos to being better in all kinds of different contexts. I think humans can do something similar, but have much worse memory.
I think in the case of humans and LLMs, this is substantially subconcious/non-explicit, so I don’t think this is well described as having a shoulder Bob.
Also, I would say that humans do learn from imitation! (You can call it prediction, but it doesn’t matter what you call it as long as it implies that data from humans makes things scale more continuously through the human ragne.) I just think that you can do better at this than humans based on the LLM case, mostly because humans aren’t exposed to as much data.
Also, I think the question is “can you somehow make use of imitation data” not “can the brain learning algorithm immediately use of imitation”?
Notably this analogy implies LLMs will be able to automate substantial fractions of human work prior to a new paradigm which (over the course of a year or two and using vast computational resources) beats the best humans. This is very different from the “brain in a basement” model IMO. I get that you think the analogy is imperfect (and I agree), but it seems worth noting that the analogy you’re drawing suggests something very different from what you expect to happen.
It’s substantially proprietary, but you could consider looking at the Deepseek V3 paper. We don’t actually have great understanding of the quantity and nature of algorithmic improvment after GPT-3. It would be useful for someone to do a more up to date review based on the best available evidence.
I’m not sure that complexity is protecting us. On the one hand, there are just 1MB of bases coding for the brain (and less for the connectome), but that doesn’t mean we can read it and it may take a long time to reverse engineer.
source: https://xkcd.com/1605/
On the other hand, our existing systems of LLMs are already much more complex than that. Likely more than a GB of source code for modern LLM-running compute center servers. And here the relationship between the code and the result is better known and can be iterated on much faster. We may not need to reverse engineer the brain. Experimentation may be sufficient.