This is a good question, thank you. (It’s an important topic which I won’t fully treat here.)
I guess it could be the case that the kind of intelligence that you need to engineer software and the kind that you need to develop novel algorithms are almost completely disjoint and unrelated
One very-cartoon model, which I guess you know but to lay it out:
Let’s say we have GI (general intelligence, like a human) and PI (performance intelligence). GI causes PI. Other things can cause PI. For example, board game AIs have PI through something spiritually equivalent to brute force search.
Humans are born with HGI. During their lifetime, humans gain HPI, which is fairly meager in some sense. That proceeds by a combo of HGI → HPI as well as copying a bunch of HPI from other humans. All humans apply quite a lot of HGI as kids. Humans who then reach the frontier of some field (which includes for example being a floor manager at a retail store) will largely apply their HPI, but somewhat apply their HGI.
Gippities are trained using fragments of GI but not probably not full GI. It’s unclear how much is missing, and it’s hard to get good evidence about that. Gippities have a lot of PI, way more than any one human. They get that almost entirely by copying from HPI represented in training data. They get some additional PI from various sources (RLVR, and just the pretraining itself (e.g. gippities know a bunch of things about the distribution of human text that no human knows), and from online reasoning (though of course there’s a memory problem, but that’s inessential)). That additional PI has a different and less-general distribution from HPI, because HPI comes from HGI which is GI. Thus gippities have a confusing summation of general PI copied from HPI, plus other PI.
In humans, if you have HPI, then you have HGI. But it’s possible to get generally-distributed PI without having GI by copying HPI, and that’s what gippities do. It’s also possible to have originary PI that is not generated by GI at all, which gippities also have. Thus, confusingly, gippities have lots of PI, some of it originary and some of it generally-distributed, but plausibly / probably not caused by artificial GI.
On a psychologizing note, which I hope to offer just as a hypothesis-piece to maybe track if you weren’t already (I think you’re pretty likely to be already aware of this, but from my perspective there’s a significant chance that you don’t think of it frequently enough): There’s a strong default to overly interpret things (behaviors, say) with a presumption of a human-shaped background mental context. E.g. how people ask “does the LLM believe X”, even though that question probably doesn’t straightforwardly translate from humans to LLMs at all and would lead to incorrect inferences about what behaviors would be concomitant. Cf. https://www.lesswrong.com/posts/L2h9nAtPqEFK6atSJ/an-anthropomorphic-ai-dilemma#Gemini_modeling_with_alien_contexts_is_hard
In particular, when people imagine changes to gippity-based systems, such as “unhobbling” by adding tool access, they imagine that what gets opened up for the gippity is similar to what would be opened up for a human if you newly made that same change (e.g. gave a human that tool). I think this drives some “we’re close to having AGI” intuitions, and I think it’s mistaken.
Modulo, I think that more of the capabilities are coming from the RLVF than from copying humans than you seem to think.[1]
They get that almost entirely by copying from HPI represented in training data. They get some additional PI from various sources (RLVR, and just the pretraining itself (e.g. gippities know a bunch of things about the distribution of human text that no human knows), and from online reasoning (though of course there’s a memory problem, but that’s inessential)).
Why are you emphasizing the pretraining instead of the RL?
This is a good question, thank you. (It’s an important topic which I won’t fully treat here.)
One very-cartoon model, which I guess you know but to lay it out:
On a psychologizing note, which I hope to offer just as a hypothesis-piece to maybe track if you weren’t already (I think you’re pretty likely to be already aware of this, but from my perspective there’s a significant chance that you don’t think of it frequently enough): There’s a strong default to overly interpret things (behaviors, say) with a presumption of a human-shaped background mental context. E.g. how people ask “does the LLM believe X”, even though that question probably doesn’t straightforwardly translate from humans to LLMs at all and would lead to incorrect inferences about what behaviors would be concomitant. Cf. https://www.lesswrong.com/posts/L2h9nAtPqEFK6atSJ/an-anthropomorphic-ai-dilemma#Gemini_modeling_with_alien_contexts_is_hard
In particular, when people imagine changes to gippity-based systems, such as “unhobbling” by adding tool access, they imagine that what gets opened up for the gippity is similar to what would be opened up for a human if you newly made that same change (e.g. gave a human that tool). I think this drives some “we’re close to having AGI” intuitions, and I think it’s mistaken.
I like and basically endorse your cartoon model!
Modulo, I think that more of the capabilities are coming from the RLVF than from copying humans than you seem to think.[1]
Why are you emphasizing the pretraining instead of the RL?
Though Zack dropped a paper in this thread which looks relevant to that question.