There are differences between ANNs and BNNs but they don’t matter that much—LLMs converge to learn the same internal representations as linguistic cortex anyway.
When humans make plans, the distribution they sample from has all sorts of unique and interesting properties that arise from various features of human biology and culture and the interaction between them. Big artificial neural nets will lack these features, so the distribution they draw from will be significantly different
LLMs and human brains learn from basically the same data with similar training objectives powered by universal approximations of bayesian inference and thus learn very similar internal functions/models.
Moravec was absolutely correct to use the term ‘mind children’ and all that implies. I outlined the case why the human brain and DL systems are essentially the same way way back in 2015 and every year since we have accumulated further confirming evidence. The closely related scaling hypothesis—predicted in that post—was extensively tested by openAI and worked at least as well as I predicted/expected, taking us to the brink of AGI.
LLMs:
learn very much like the cortex, converging to the same internal representations
acquire the same human cognitive biases and limitations
predictably develop human like cognitive abilities with scale
are extremely human, not alien at all
That doesn’t make them automatically safe, but they are not potentially unsafe because they are alien.
LLMs and human brains learn from basically the same data with similar training objectives powered by universal approximations of bayesian inference and thus learn very similar internal functions/models.
This argument proves too much. A Solomonoff inductor (AIXI) running on a hypercomputer would also “learn from basically the same data” (sensory data produced by the physical universe) with “similar training objectives” (predict the next bit of sensory information) using “universal approximations of Bayesian inference” (a perfect approximation, in this case), and yet it would not be the case that you could then conclude that AIXI “learns very similar internal functions/models”. (In fact, the given example of AIXI is much closer to Rob’s initial description of “sampling from the space of possible plans, weighted by length”!)
In order to properly argue this, you need to talk about more than just training objectives and approximations to Bayes; you need to first investigate the actual internal representations of the systems in question, and verify that they are isomorphic to the ones humans use. Currently, I’m not aware of any investigations into this that I’d consider satisfactory.
(Note here that I’ve skimmed the papers you cite in your linked posts, and for most of them it seems to me either (a) they don’t make the kinds of claims you’d need to establish a strong conclusion of “therefore, AI systems think like humans”, or (b) they do make such claims, but then the described investigation doesn’t justify those claims.)
Full Solomon Induction on a hypercomputer absolutely does not just “learn very similar internal functions models”, it effectively recreates actual human brains.
Full SI on a hypercomputer is equivalent to instantiating a computational multiverse and allowing us to access it. Reading out data samples corresponding to text from that is equivalent to reading out samples of actual text produced by actual human brains in other universes close to ours.
you need to first investigate the actual internal representations of the systems in question, and verify that they are isomorphic to the ones humans use.
This has been ongoing for over a decade or more (dating at least back to Sparse Coding as an explanation for V1).
But I will agree the bigger LLMs are now in a somewhat different territory—more like human cortices trained for millennia, perhaps ten millennia for GPT4.
Full Solomon Induction on a hypercomputer absolutely does not just “learn very similar internal functions models”, it effectively recreates actual human brains.
Full SI on a hypercomputer is equivalent to instantiating a computational multiverse and allowing us to access it. Reading out data samples corresponding to text from that is equivalent to reading out samples of actual text produced by actual human brains in other universes close to ours.
...yes? And this is obviously very, very different from how humans represent things internally?
I mean, for one thing, humans don’t recreate exact simulations of other humans in our brains (even though “predicting other humans” is arguably the high-level cognitive task we are most specced for doing). But even setting that aside, the Solomonoff inductor’s hypothesis also contains a bunch of stuff other than human brains, modeled in full detail—which again is not anything close to how humans model the world around us.
I admit to having some trouble following your (implicit) argument here. Is it that, because a Solomonoff inductor is capable of simulating humans, that makes it “human-like” in some sense relevant to alignment? (Specifically, that doing the plan-sampling thing Rob mentioned in the OP with a Solomonoff inductor will get you a safe result, because it’ll be “humans in other universes” writing the plans? If so, I don’t see how that follows at all; I’m pretty sure having humans somewhere inside of your model doesn’t mean that that part of your model is what ends up generating the high-level plans being sampled by the outer system.)
It really seems to me that if I accept what looks to me like your argument, I’m basically forced to conclude that anything with a simplicity prior (trained on human data) will be aligned, meaning (in turn) the orthogonality thesis is completely false. But… well, I obviously don’t buy that, so I’m puzzled that you seem to be stressing this point (in both this comment and other comments, e.g. this reply to me elsethread):
Note I didn’t actually reply to that quote. Sure that’s an explicit simplicity prior. However there’s a large difference under the hood between using an explicit simplicity prior on plan length vs an implicit simplicity prior on the world and action models which generate plans. The latter is what is more relevant for intrinsic similarity to human though processes (or not).
(to be clear, my response to this is basically everything I wrote above; this is not meant as its own separate quote-reply block)
you need to first investigate the actual internal representations of the systems in question, and verify that they are isomorphic to the ones humans use.
This has been ongoing for over a decade or more (dating at least back to Sparse Coding as an explanation for V1).
That’s not what I mean by “internal representations”. I’m referring to the concepts learned by the model, and whether analogues for those concepts exist in human thought-space (and if so, how closely they match each other). It’s not at all clear to me that this occurs by default, and I don’t think the fact that there are some statistical similarities between the high-level encoding approaches being used means that similar concepts end up being converged to. (Which is what is relevant, on my model, when it comes to questions like “if you sample plans from this system, what kinds of plans does it end up outputting, and do they end up being unusually dangerous relative to the kinds of plans humans tend to sample?”)
I agree that sparse coding as an approach seems to have been anticipated by evolution, but your raising this point (and others like it), seemingly as an argument that this makes systems more likely to be aligned by default, feels thematically similar to some of my previous objections—which (roughly) is that you seem to be taking a fairly weak premise (statistical learning models likely have some kind of simplicity prior built in to their representation schema) and running with that premise wayyy further than I think is licensed—running, so far as I can tell, directly to the absolute edge of plausibility, with a conclusion something like “And therefore, these systems will be aligned.” I don’t think the logical leap here has been justified!
I think we are starting to talk past each other, so let me just summarize my position (and what i’m not arguing):
1.) ANNs and BNNs converge in their internal representations, in part because of how physics only permits a narrow pareto efficient solution set, but also because ANNs are literally trained as distillations of BNNs. (More well known/accepted now, but I argued/predicted this well in advance (at least as early as 2015)).
2.) Because of 1.), there is no problem with ‘alien thoughts’ based on mindspace geometry. That was just never going to be a problem.
3.) Neither 1 or 2 are sufficient for alignment by default—both points apply rather obviously to humans, who are clearly not aligned by default with other humans or humanity in general.
Earlier you said:
A Solomonoff inductor (AIXI) running on a hypercomputer would also “learn from basically the same data” (sensory data produced by the physical universe) with “similar training objectives” (predict the next bit of sensory information) using “universal approximations of Bayesian inference” (a perfect approximation, in this case), and yet it would not be the case that you could then conclude that AIXI “learns very similar internal functions/models”.
I then pointed out that full SI on a hypercomputer would result in recreating entire worlds with human minds, but that was a bit of a tangent. The more relevant point is more nuanced: AIXI is SI plus some reward function. So all different possible AIXI agents share the exact same world model, yet they have different reward functions and thus would generate different plans and may well end up killing each other or something.
So having exactly the same world model is not sufficient for alignment—I’m not and would never argue that
But if you train a LLM to distill human thought sequences, those thought sequences can implicitly contain plans, value judgements or the equivalents. Thus LLMs can naturally align to human values to varying degrees, merely through their training as distillations of human thought. This of course by itself doesn’t guarantee alignment, but it is a much more hopeful situation to be in, because you can exert a great deal of control through control of the training data.
It’s all relative. “Are extremely human, not alien at all” --> Are you seriously saying that e.g. if and when we one day encounter aliens on another planet, the kind of aliens smart enough to build an industrial civilization, they’ll be more alien than LLMs? (Well, obviously they won’t have been trained on the human Internet. So let’s imagine we took a whole bunch of them as children and imported them to Earth and raised them in some crazy orphanage where they were forced to watch TV and read the internet and play various video games all day.)
Because I instead say that all your arguments about similar learning algorithms, similar cognitive biases, etc. will apply even more strongly (in expectation) to these hypothetical aliens capable of building industrial civilization. So the basic relationship of humans<aliens<LLMs will still hold; LLMs will still be more alien than aliens.
Are you seriously saying that e.g. if and when we one day encounter aliens on another planet, the kind of aliens smart enough to build an industrial civilization, they’ll be more alien than LLMs?
Yes! obviously more alien than our LLMs. LLMs are distillations of aggregated human linguistic cortices. Anytime you train one network on the output of others, you clone distill the original(s)! The algorithmic content of NNs is determined by the training data, and the data here in question is human thought.
This was always the way it was going to be, this was all predicted long in advance by the systems/cybernetics futurists like Moravec—AI was/will be our mind children.
EY misled many people here with the bad “human mindspace is narrow meme”, I mostly agree with Quintin’s recent takedown, but I of course also objected way back when.
I really don’t buy this. To be clear: Your answer is Yes, including in the variant case I proposed in parentheses, where the aliens were taken as children and raised in a crazy Earth orphanage?
I didn’t notice the part in parentheses at all until just now—added in edit? The edit really doesn’t agree with the original question to me.
If you took alien children and raised them as earthlings you’d get mostly earthlings in alien bodies—given some assumptions they had roughly similar sized brains and reasonably parallel evolution. Something like this has happened historically—when uncontacted tribal children are raised in a distant advanced civ for example. Western culture—WIERD—has so pervasively colonized and conquered much of the memetic landscape that we have forgotten how diverse human mindspace can be (in some sense it could be WIERD that was the alien invasion ).
Also more locally on earth: japanese culture is somewhat alien compared to western english/american culture. I expect actual alien culture to be more alien.
I don’t necessarily agree—as I don’t consider either to be very alien. Minds are software memetic constructs so you are just comparing human software running on GPUs vs human software running on alien brains. How different that is and which is more different than human software running on ape brains now depends on many cumbersome details.
There are differences between ANNs and BNNs but they don’t matter that much—LLMs converge to learn the same internal representations as linguistic cortex anyway.
LLMs and human brains learn from basically the same data with similar training objectives powered by universal approximations of bayesian inference and thus learn very similar internal functions/models.
Moravec was absolutely correct to use the term ‘mind children’ and all that implies. I outlined the case why the human brain and DL systems are essentially the same way way back in 2015 and every year since we have accumulated further confirming evidence. The closely related scaling hypothesis—predicted in that post—was extensively tested by openAI and worked at least as well as I predicted/expected, taking us to the brink of AGI.
LLMs:
learn very much like the cortex, converging to the same internal representations
acquire the same human cognitive biases and limitations
predictably develop human like cognitive abilities with scale
are extremely human, not alien at all
That doesn’t make them automatically safe, but they are not potentially unsafe because they are alien.
This argument proves too much. A Solomonoff inductor (AIXI) running on a hypercomputer would also “learn from basically the same data” (sensory data produced by the physical universe) with “similar training objectives” (predict the next bit of sensory information) using “universal approximations of Bayesian inference” (a perfect approximation, in this case), and yet it would not be the case that you could then conclude that AIXI “learns very similar internal functions/models”. (In fact, the given example of AIXI is much closer to Rob’s initial description of “sampling from the space of possible plans, weighted by length”!)
In order to properly argue this, you need to talk about more than just training objectives and approximations to Bayes; you need to first investigate the actual internal representations of the systems in question, and verify that they are isomorphic to the ones humans use. Currently, I’m not aware of any investigations into this that I’d consider satisfactory.
(Note here that I’ve skimmed the papers you cite in your linked posts, and for most of them it seems to me either (a) they don’t make the kinds of claims you’d need to establish a strong conclusion of “therefore, AI systems think like humans”, or (b) they do make such claims, but then the described investigation doesn’t justify those claims.)
Full Solomon Induction on a hypercomputer absolutely does not just “learn very similar internal functions models”, it effectively recreates actual human brains.
Full SI on a hypercomputer is equivalent to instantiating a computational multiverse and allowing us to access it. Reading out data samples corresponding to text from that is equivalent to reading out samples of actual text produced by actual human brains in other universes close to ours.
This has been ongoing for over a decade or more (dating at least back to Sparse Coding as an explanation for V1).
But I will agree the bigger LLMs are now in a somewhat different territory—more like human cortices trained for millennia, perhaps ten millennia for GPT4.
...yes? And this is obviously very, very different from how humans represent things internally?
I mean, for one thing, humans don’t recreate exact simulations of other humans in our brains (even though “predicting other humans” is arguably the high-level cognitive task we are most specced for doing). But even setting that aside, the Solomonoff inductor’s hypothesis also contains a bunch of stuff other than human brains, modeled in full detail—which again is not anything close to how humans model the world around us.
I admit to having some trouble following your (implicit) argument here. Is it that, because a Solomonoff inductor is capable of simulating humans, that makes it “human-like” in some sense relevant to alignment? (Specifically, that doing the plan-sampling thing Rob mentioned in the OP with a Solomonoff inductor will get you a safe result, because it’ll be “humans in other universes” writing the plans? If so, I don’t see how that follows at all; I’m pretty sure having humans somewhere inside of your model doesn’t mean that that part of your model is what ends up generating the high-level plans being sampled by the outer system.)
It really seems to me that if I accept what looks to me like your argument, I’m basically forced to conclude that anything with a simplicity prior (trained on human data) will be aligned, meaning (in turn) the orthogonality thesis is completely false. But… well, I obviously don’t buy that, so I’m puzzled that you seem to be stressing this point (in both this comment and other comments, e.g. this reply to me elsethread):
(to be clear, my response to this is basically everything I wrote above; this is not meant as its own separate quote-reply block)
That’s not what I mean by “internal representations”. I’m referring to the concepts learned by the model, and whether analogues for those concepts exist in human thought-space (and if so, how closely they match each other). It’s not at all clear to me that this occurs by default, and I don’t think the fact that there are some statistical similarities between the high-level encoding approaches being used means that similar concepts end up being converged to. (Which is what is relevant, on my model, when it comes to questions like “if you sample plans from this system, what kinds of plans does it end up outputting, and do they end up being unusually dangerous relative to the kinds of plans humans tend to sample?”)
I agree that sparse coding as an approach seems to have been anticipated by evolution, but your raising this point (and others like it), seemingly as an argument that this makes systems more likely to be aligned by default, feels thematically similar to some of my previous objections—which (roughly) is that you seem to be taking a fairly weak premise (statistical learning models likely have some kind of simplicity prior built in to their representation schema) and running with that premise wayyy further than I think is licensed—running, so far as I can tell, directly to the absolute edge of plausibility, with a conclusion something like “And therefore, these systems will be aligned.” I don’t think the logical leap here has been justified!
I think we are starting to talk past each other, so let me just summarize my position (and what i’m not arguing):
1.) ANNs and BNNs converge in their internal representations, in part because of how physics only permits a narrow pareto efficient solution set, but also because ANNs are literally trained as distillations of BNNs. (More well known/accepted now, but I argued/predicted this well in advance (at least as early as 2015)).
2.) Because of 1.), there is no problem with ‘alien thoughts’ based on mindspace geometry. That was just never going to be a problem.
3.) Neither 1 or 2 are sufficient for alignment by default—both points apply rather obviously to humans, who are clearly not aligned by default with other humans or humanity in general.
Earlier you said:
I then pointed out that full SI on a hypercomputer would result in recreating entire worlds with human minds, but that was a bit of a tangent. The more relevant point is more nuanced: AIXI is SI plus some reward function. So all different possible AIXI agents share the exact same world model, yet they have different reward functions and thus would generate different plans and may well end up killing each other or something.
So having exactly the same world model is not sufficient for alignment—I’m not and would never argue that
But if you train a LLM to distill human thought sequences, those thought sequences can implicitly contain plans, value judgements or the equivalents. Thus LLMs can naturally align to human values to varying degrees, merely through their training as distillations of human thought. This of course by itself doesn’t guarantee alignment, but it is a much more hopeful situation to be in, because you can exert a great deal of control through control of the training data.
It’s all relative. “Are extremely human, not alien at all” --> Are you seriously saying that e.g. if and when we one day encounter aliens on another planet, the kind of aliens smart enough to build an industrial civilization, they’ll be more alien than LLMs? (Well, obviously they won’t have been trained on the human Internet. So let’s imagine we took a whole bunch of them as children and imported them to Earth and raised them in some crazy orphanage where they were forced to watch TV and read the internet and play various video games all day.)
Because I instead say that all your arguments about similar learning algorithms, similar cognitive biases, etc. will apply even more strongly (in expectation) to these hypothetical aliens capable of building industrial civilization. So the basic relationship of humans<aliens<LLMs will still hold; LLMs will still be more alien than aliens.
Yes! obviously more alien than our LLMs. LLMs are distillations of aggregated human linguistic cortices. Anytime you train one network on the output of others, you clone distill the original(s)! The algorithmic content of NNs is determined by the training data, and the data here in question is human thought.
This was always the way it was going to be, this was all predicted long in advance by the systems/cybernetics futurists like Moravec—AI was/will be our mind children.
EY misled many people here with the bad “human mindspace is narrow meme”, I mostly agree with Quintin’s recent takedown, but I of course also objected way back when.
Nice to see us getting down to cruxes.
I really don’t buy this. To be clear: Your answer is Yes, including in the variant case I proposed in parentheses, where the aliens were taken as children and raised in a crazy Earth orphanage?
I didn’t notice the part in parentheses at all until just now—added in edit? The edit really doesn’t agree with the original question to me.
If you took alien children and raised them as earthlings you’d get mostly earthlings in alien bodies—given some assumptions they had roughly similar sized brains and reasonably parallel evolution. Something like this has happened historically—when uncontacted tribal children are raised in a distant advanced civ for example. Western culture—WIERD—has so pervasively colonized and conquered much of the memetic landscape that we have forgotten how diverse human mindspace can be (in some sense it could be WIERD that was the alien invasion ).
Also more locally on earth: japanese culture is somewhat alien compared to western english/american culture. I expect actual alien culture to be more alien.
I’m pretty sure I didn’t edit it, I think that was there from the beginning.
OK, cool. So then you agree that LLMs will be more alien than aliens-who-were-raised-on-Earth-in-crazy-internet-text-pretraining-orphanage?
I don’t necessarily agree—as I don’t consider either to be very alien. Minds are software memetic constructs so you are just comparing human software running on GPUs vs human software running on alien brains. How different that is and which is more different than human software running on ape brains now depends on many cumbersome details.
How do we know that the human brain and LLMs converge to the same internal representations—is that addressed in your earlier write-up?
Yes—It was already known for vision back in that 2015 post, and in my later posts I revisit the issue here and later here