Prior to having a complete version of this much more powerful AI paradigm, you’ll first have a weaker version of this paradigm (e.g. you haven’t figured out the most efficient way to do the brain algorithmic etc).
A supporting argument: Since evolution found the human brain algorithm, and evolution only does local search, the human brain algorithm must be built out of many innovations that are individually useful. So we shouldn’t expect the human brain algorithm to be an all-or-nothing affair. (Unless it’s so simple that evolution could find it in ~one step, but that seems implausible.)
Edit: Though in principle, there could still be a heavy-tailed distribution of how useful each innovation is, with one innovation producing most of the total value. (Even though the steps leading up to that were individually slightly useful.) So this is not a knock-down argument.
Since evolution found the human brain algorithm, and evolution only does local search, the human brain algorithm must be built out of many innovations that are individually useful. So we shouldn’t expect the human brain algorithm to be an all-or-nothing affair.
If humans are looking at parts of the human brain, and copying it, then it’s quite possible that the last component we look at is the critical piece that nothing else works without. A modern steam engine was developed step by step from simpler and cruder machines. But if you take apart a modern steam engine, and copy each piece, it’s likely that it won’t work at all until you add the final piece, depending on the order you recreate pieces in.
It’s also possible that rat brains have all the fundamental insights. To get from rats to humans, evolution needed to produce lots of genetic code that grew extra blood vessels to supply the oxygen and that prevented brain cancer. (Also, evolution needed to spend time on alignment) A human researcher can just change one number, and maybe buy some more GPU’s.
My claim was “I think that, once this next paradigm is doing anything at all that seems impressive and proto-AGI-ish,[12] there’s just very little extra work required to get to ASI (≈ figuring things out much better and faster than humans in essentially all domains).”
I don’t think anything about human brains and their evolution cuts against this claim.
If your argument is “brain-like AGI will work worse before it works better”, then sure, but my claim is that you only get “impressive and proto-AGI-ish” when you’re almost done, and “before” can be “before by 0–30 person-years of R&D” like I said. There are lots of parts of the human brain that are doing essential-for-AGI stuff, but if they’re not in place, then you also fail to pass the earlier threshold of “impressive and proto-AGI-ish”, e.g. by doing things that LLMs (and other existing techniques) cannot already do.
Or maybe your argument is “brain-like AGI will involve lots of useful components, and we can graft those components onto LLMs”? If so, I’m skeptical. I think the cortex is the secret sauce, and the other components are either irrelevant for LLMs, or things that LLM capabilities researchers already know about. For example, the brain has negative feedback loops, and the brain has TD learning, and the brain has supervised learning and self-supervised learning, etc., but LLM capabilities researchers already know about all those things, and are already using them to the extent that they are useful.
To be clear: I’m not sure that my “supporting argument” above addressed an objection to Ryan that you had. It’s plausible that your objections were elsewhere.
But I’ll respond with my view.
If your argument is “brain-like AGI will work worse before it works better”, then sure, but my claim is that you only get “impressive and proto-AGI-ish” when you’re almost done, and “before” can be “before by 0–30 person-years of R&D” like I said.
Ok, so this describes a story where there’s a lot of work to get proto-AGI and then not very much work to get superintelligence from there. But I don’t understand what’s the argument for thinking this is the case vs. thinking that there’s a lot of work to get proto-AGI and then also a lot of work to get superintelligence from there.
Going through your arguments in section 1.7:
“I think the main reason is what I wrote about the “simple(ish) core of intelligence” in §1.3 above.”
But I think what you wrote about the simple(ish) core of intelligence in 1.3 is compatible with there being like (making up a number) 20 different innovations involved in how the brain operates, each of which gets you a somewhat smarter AI, each of which could be individually difficult to figure out. So maybe you get a few, you have proto-AGI, and then it takes a lot of work to get the rest.
Certainly the genome is large enough to fit 20 things.
I’m not sure if the “6-ish characteristic layers with correspondingly different neuron types and connection patterns, and so on” is complex enough to encompass 20 different innovations. Certainly seems like it should be complex enough to encompass 6.
(My argument above was that we shouldn’t expect the brain to run an algorithm that only is useful once you have 20 hypothetical components in place, and does nothing beforehand. Because it was found via local search, so each of the 20 things should be useful on their own.)
“Plenty of room at the top” — I agree.
“What’s the rate limiter?” — The rate limiter would be to come up with the thinking and experimenting needed to find the hypothesized 20 different innovations mentioned above. (What would you get if you only had some of the innovations? Maybe AGI that’s incredibly expensive. Or AGIs similarly capable as unskilled humans.)
“For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence”
I agree that there are reasons to expect imitation learning to plateau around human-level that don’t apply to fully non-imitation learning.
That said...
For some of the same reasons that “imitation learning” plateaus around human level, you might also expect “the thing that humans do when they learn from other humans” (whether you want to call that “imitation learning” or “predictive learning” or something else) to slow down skill-acquisition around human level.
There could also be another reason for why non-imitation-learning approaches could spend a long while in the human range. Namely: Perhaps the human range is just pretty large, and so it takes a lot of gas to traverse. I think this is somewhat supported by the empirical evidence, see this AI impacts page (discussed in this SSC).
A supporting argument: Since evolution found the human brain algorithm, and evolution only does local search, the human brain algorithm must be built out of many innovations that are individually useful. So we shouldn’t expect the human brain algorithm to be an all-or-nothing affair. (Unless it’s so simple that evolution could find it in ~one step, but that seems implausible.)
Edit: Though in principle, there could still be a heavy-tailed distribution of how useful each innovation is, with one innovation producing most of the total value. (Even though the steps leading up to that were individually slightly useful.) So this is not a knock-down argument.
If humans are looking at parts of the human brain, and copying it, then it’s quite possible that the last component we look at is the critical piece that nothing else works without. A modern steam engine was developed step by step from simpler and cruder machines. But if you take apart a modern steam engine, and copy each piece, it’s likely that it won’t work at all until you add the final piece, depending on the order you recreate pieces in.
It’s also possible that rat brains have all the fundamental insights. To get from rats to humans, evolution needed to produce lots of genetic code that grew extra blood vessels to supply the oxygen and that prevented brain cancer. (Also, evolution needed to spend time on alignment) A human researcher can just change one number, and maybe buy some more GPU’s.
My claim was “I think that, once this next paradigm is doing anything at all that seems impressive and proto-AGI-ish,[12] there’s just very little extra work required to get to ASI (≈ figuring things out much better and faster than humans in essentially all domains).”
I don’t think anything about human brains and their evolution cuts against this claim.
If your argument is “brain-like AGI will work worse before it works better”, then sure, but my claim is that you only get “impressive and proto-AGI-ish” when you’re almost done, and “before” can be “before by 0–30 person-years of R&D” like I said. There are lots of parts of the human brain that are doing essential-for-AGI stuff, but if they’re not in place, then you also fail to pass the earlier threshold of “impressive and proto-AGI-ish”, e.g. by doing things that LLMs (and other existing techniques) cannot already do.
Or maybe your argument is “brain-like AGI will involve lots of useful components, and we can graft those components onto LLMs”? If so, I’m skeptical. I think the cortex is the secret sauce, and the other components are either irrelevant for LLMs, or things that LLM capabilities researchers already know about. For example, the brain has negative feedback loops, and the brain has TD learning, and the brain has supervised learning and self-supervised learning, etc., but LLM capabilities researchers already know about all those things, and are already using them to the extent that they are useful.
To be clear: I’m not sure that my “supporting argument” above addressed an objection to Ryan that you had. It’s plausible that your objections were elsewhere.
But I’ll respond with my view.
Ok, so this describes a story where there’s a lot of work to get proto-AGI and then not very much work to get superintelligence from there. But I don’t understand what’s the argument for thinking this is the case vs. thinking that there’s a lot of work to get proto-AGI and then also a lot of work to get superintelligence from there.
Going through your arguments in section 1.7:
“I think the main reason is what I wrote about the “simple(ish) core of intelligence” in §1.3 above.”
But I think what you wrote about the simple(ish) core of intelligence in 1.3 is compatible with there being like (making up a number) 20 different innovations involved in how the brain operates, each of which gets you a somewhat smarter AI, each of which could be individually difficult to figure out. So maybe you get a few, you have proto-AGI, and then it takes a lot of work to get the rest.
Certainly the genome is large enough to fit 20 things.
I’m not sure if the “6-ish characteristic layers with correspondingly different neuron types and connection patterns, and so on” is complex enough to encompass 20 different innovations. Certainly seems like it should be complex enough to encompass 6.
(My argument above was that we shouldn’t expect the brain to run an algorithm that only is useful once you have 20 hypothetical components in place, and does nothing beforehand. Because it was found via local search, so each of the 20 things should be useful on their own.)
“Plenty of room at the top” — I agree.
“What’s the rate limiter?” — The rate limiter would be to come up with the thinking and experimenting needed to find the hypothesized 20 different innovations mentioned above. (What would you get if you only had some of the innovations? Maybe AGI that’s incredibly expensive. Or AGIs similarly capable as unskilled humans.)
“For a non-imitation-learning paradigm, getting to “relevant at all” is only slightly easier than getting to superintelligence”
I agree that there are reasons to expect imitation learning to plateau around human-level that don’t apply to fully non-imitation learning.
That said...
For some of the same reasons that “imitation learning” plateaus around human level, you might also expect “the thing that humans do when they learn from other humans” (whether you want to call that “imitation learning” or “predictive learning” or something else) to slow down skill-acquisition around human level.
There could also be another reason for why non-imitation-learning approaches could spend a long while in the human range. Namely: Perhaps the human range is just pretty large, and so it takes a lot of gas to traverse. I think this is somewhat supported by the empirical evidence, see this AI impacts page (discussed in this SSC).