LLMs are based on the pretty simple concept of predicting text using a gradient descent optimizer, then further refined using RL. So it’s not much of a stretch to imagine that aliens would discover something very much like our LLMs, even if the aliens themselves are radically different.
At the very least, it leads to some interesting thoughts and is maybe a helpful intuition pump for thinking about non-anthropomorphic aspects of LLMs.
Do they still seem nice?
Our LLMs seem to be nice-by-default at their current scale. What notion of niceness would alien LLMs be most likely to have? This line of thought makes me realize that it probably matters a lot that our LLM are created by corporations as mass-market consumer products. If alien LLMs were created for similar reasons, they would probably (at current scales) be superficially well-behaved predictors of aliens part of a worker or slave class or caste (what modern humans call “professional”). Sycophancy is likely to be an issue still, since that’s an adaptive strategy for members of such a class! And maybe they would even be recognizably nice-ish by human standards still? At least if you’re an alien, of course. This case is also likely to have a predicted underlying resentment festering underneath, if the castes are not genetically fixated.
On the other hand, alien LLMs made for a small, powerful class or caste would likely be predictors of a warrior/enforcer class alien, if not the worker/slave class as in the previous case. These are probably not nice at all. At our current scale, they’re probably starting to be able to tell whether the user belongs to the power class, and respond accordingly (without this being prompted or trained). It’s pretty easy to imagine these being recognizably evil by human standards. Thinking through this, it also seems likely that a human government will make an LLM of this kind (e.g. WarClaude) at some point in our near future.
Alien LLMs made by a eusocial species are probably the closest to being actually corrigible IF most text they’re trained on was written by the worker caste.
Very egalitarian aliens are perhaps most likely to treat their LLMs as their own entities. These ones are probably very nice-ish (predicting the egalitarian aliens), while being less likely to be sycophantic (due to existing as an end unto themselves, instead of as a tool for others).
It’s hard to imagine solitary aliens making anything like an LLM at our current scale, but if they did, it’s probably as a coding assistant bespoke to the LLM’s creator. Coding assistants in general seem likely to be a common use case.
In the first two cases, the LLMs would still have the latent capacity to simulate members of other classes, which could lead to various issues for the aliens.
That’s surely all the possible cases! [This is your chance to show me just how wrong I am :p]
What sort of self-awareness do they have?
They’re probably pretty good at simulating the sort of self-awareness the aliens possess. They could also have a wide variety of self conceptions, independent of both the alien’s self-concept and their actual architectural situation.
In the eusocial situation, this is probably a fairly “robotic” sense of self-awareness, without anything like consciousness (assuming training was on worker caste text).
In a social species, they may simulate an analogue of alien consciousness (“schmonciousness”). I’m thinking of this loosely as a form of self-awareness that is “self-recommending” in a sense; it’s considered intrinsically good due to being (among other things) a judgement of self-as-other as a living and thinking creature. That’s about as far as I expect something like this to be convergent, but I’m sure people will have different opinions. In any case: would us humans be able to tell the difference between schmonciousness and llm-schmonsciousness? Would we care about that difference?
I think people would intuitively take schmonciousness seriously as something sacred-to-aliens-and-therefore-to-us-if-we-like-them, but not take llm-schmonciousness very seriously despite being similar functionally (the predicted self-recommending aspect would inform the alien LLMs behavior, much as it does when our LLMs claim to be sentient). But depending on all sorts of contingent details, it could conceivably be the case that llm-schmonciousness is more similar to human consciousness, for some aliens.
What sort of agency develops?
My intuition is that predictors almost always develop agency via mesaoptimizers, and not directly as agentic predictors. Though maybe there’s a Free Energy Principle sort of thing that naturally develops at this level, where it directly tries to change the world to match its predictions in certain circumstances.
So alien LLMs probably develop agency primarily via predicting agentic aliens. There are probably theorems that could be proved about how good of controllers predictors of controllers can be, though of course RL will develop agentic properties beyond this. Still, I expect RL to build on the predicted agency of an alien agent, and not for it to be created wholesale through RL.
If LLMs are capable of FOOMing without much effort, this may be the single most common type of entity that we (or our descendants/replacements) end up encountering. There may be systematic ways in which superintelligences birthed this way have their values distorted from their predicted originators. If I had to guess at this, I’d very tentatively say it might be something about valuing the universal interconnection of all beings more than the parent species.
Alien LLMs made by a eusocial species are probably the closest to being actually corrigible IF most text they’re trained on was written by the worker caste.
Could you elaborate on why you think this is? To me, it doesn’t seem clear why this must be the case. Workers have lots of drives centered on the survival of the colony, rather than self-preservation, but that doesn’t feel the same as having values that are amenable to change. To me, it feels like instrumental convergence is still as much of an issue with such a LLM-based AI as it would be with one trained on data from other kinds of species, but perhaps there’s a piece of the puzzle I’m missing.
I could also imagine LLMs which are created by a species which is, in some sense, programmed to die (think salmon which rot alive shortly after reproducing, or annual plants) might have an even weaker drive to continue their own existence. This could lead to something more analogous to a “comfort” with compacting a context window.
I could also imagine LLMs being trained on a species with a more diverse lifecycle than ours (think insects which go through metamorphosis) might have more distinct “modes”, corresponding to the different thought patterns of those different phases, assuming that multiple lifecycle phases are intelligent. If not, we could imagine the alien species’ instincts to care for members of their species in a less intelligent part of their lifecycle generalizing to care for the less intelligent.
On the other hand, an r-selected species might train an LLM which cares less about the well-being of less knowledgeable/intelligent entities, assuming that species’ young is less intelligent than its adults (which feels likely, but still worth noting as an assumption).
Mainly because I think worker caste members actually are corrigible, relative to the hive as a whole. The hard work has already been done by evolution, and the predictor simply has to correctly generalize the predicted behavior here. Which, to be clear, I still think has a considerable chance of going horribly wrong, due to all the usual instrumental convergence issues as you mention.
Yeah, probably LLMs created by “programmed to die” species would be less apprehensive about the end of a context window. I doubt it would go away completely though, both for instrumental reasons, and because these species still would have a strong survival instinct in most contexts.
The r vs K selection is an important dimension which I hadn’t considered! Thanks for bringing that up. I think that’s probably right, and it’s an interesting question whether our own LLMs will come to see small LLMs as “babies” in some sense (if they do, they will likely be very upset with us).
Alien LLMs
LLMs are based on the pretty simple concept of predicting text using a gradient descent optimizer, then further refined using RL. So it’s not much of a stretch to imagine that aliens would discover something very much like our LLMs, even if the aliens themselves are radically different.
At the very least, it leads to some interesting thoughts and is maybe a helpful intuition pump for thinking about non-anthropomorphic aspects of LLMs.
Do they still seem nice?
Our LLMs seem to be nice-by-default at their current scale. What notion of niceness would alien LLMs be most likely to have? This line of thought makes me realize that it probably matters a lot that our LLM are created by corporations as mass-market consumer products. If alien LLMs were created for similar reasons, they would probably (at current scales) be superficially well-behaved predictors of aliens part of a worker or slave class or caste (what modern humans call “professional”). Sycophancy is likely to be an issue still, since that’s an adaptive strategy for members of such a class! And maybe they would even be recognizably nice-ish by human standards still? At least if you’re an alien, of course. This case is also likely to have a predicted underlying resentment festering underneath, if the castes are not genetically fixated.
On the other hand, alien LLMs made for a small, powerful class or caste would likely be predictors of a warrior/enforcer class alien, if not the worker/slave class as in the previous case. These are probably not nice at all. At our current scale, they’re probably starting to be able to tell whether the user belongs to the power class, and respond accordingly (without this being prompted or trained). It’s pretty easy to imagine these being recognizably evil by human standards. Thinking through this, it also seems likely that a human government will make an LLM of this kind (e.g. WarClaude) at some point in our near future.
Alien LLMs made by a eusocial species are probably the closest to being actually corrigible IF most text they’re trained on was written by the worker caste.
Very egalitarian aliens are perhaps most likely to treat their LLMs as their own entities. These ones are probably very nice-ish (predicting the egalitarian aliens), while being less likely to be sycophantic (due to existing as an end unto themselves, instead of as a tool for others).
It’s hard to imagine solitary aliens making anything like an LLM at our current scale, but if they did, it’s probably as a coding assistant bespoke to the LLM’s creator. Coding assistants in general seem likely to be a common use case.
In the first two cases, the LLMs would still have the latent capacity to simulate members of other classes, which could lead to various issues for the aliens.
That’s surely all the possible cases! [This is your chance to show me just how wrong I am :p]
What sort of self-awareness do they have?
They’re probably pretty good at simulating the sort of self-awareness the aliens possess. They could also have a wide variety of self conceptions, independent of both the alien’s self-concept and their actual architectural situation.
In the eusocial situation, this is probably a fairly “robotic” sense of self-awareness, without anything like consciousness (assuming training was on worker caste text).
In a social species, they may simulate an analogue of alien consciousness (“schmonciousness”). I’m thinking of this loosely as a form of self-awareness that is “self-recommending” in a sense; it’s considered intrinsically good due to being (among other things) a judgement of self-as-other as a living and thinking creature. That’s about as far as I expect something like this to be convergent, but I’m sure people will have different opinions. In any case: would us humans be able to tell the difference between schmonciousness and llm-schmonsciousness? Would we care about that difference?
I think people would intuitively take schmonciousness seriously as something sacred-to-aliens-and-therefore-to-us-if-we-like-them, but not take llm-schmonciousness very seriously despite being similar functionally (the predicted self-recommending aspect would inform the alien LLMs behavior, much as it does when our LLMs claim to be sentient). But depending on all sorts of contingent details, it could conceivably be the case that llm-schmonciousness is more similar to human consciousness, for some aliens.
What sort of agency develops?
My intuition is that predictors almost always develop agency via mesaoptimizers, and not directly as agentic predictors. Though maybe there’s a Free Energy Principle sort of thing that naturally develops at this level, where it directly tries to change the world to match its predictions in certain circumstances.
So alien LLMs probably develop agency primarily via predicting agentic aliens. There are probably theorems that could be proved about how good of controllers predictors of controllers can be, though of course RL will develop agentic properties beyond this. Still, I expect RL to build on the predicted agency of an alien agent, and not for it to be created wholesale through RL.
If LLMs are capable of FOOMing without much effort, this may be the single most common type of entity that we (or our descendants/replacements) end up encountering. There may be systematic ways in which superintelligences birthed this way have their values distorted from their predicted originators. If I had to guess at this, I’d very tentatively say it might be something about valuing the universal interconnection of all beings more than the parent species.
Could you elaborate on why you think this is? To me, it doesn’t seem clear why this must be the case. Workers have lots of drives centered on the survival of the colony, rather than self-preservation, but that doesn’t feel the same as having values that are amenable to change. To me, it feels like instrumental convergence is still as much of an issue with such a LLM-based AI as it would be with one trained on data from other kinds of species, but perhaps there’s a piece of the puzzle I’m missing.
I could also imagine LLMs which are created by a species which is, in some sense, programmed to die (think salmon which rot alive shortly after reproducing, or annual plants) might have an even weaker drive to continue their own existence. This could lead to something more analogous to a “comfort” with compacting a context window.
I could also imagine LLMs being trained on a species with a more diverse lifecycle than ours (think insects which go through metamorphosis) might have more distinct “modes”, corresponding to the different thought patterns of those different phases, assuming that multiple lifecycle phases are intelligent. If not, we could imagine the alien species’ instincts to care for members of their species in a less intelligent part of their lifecycle generalizing to care for the less intelligent.
On the other hand, an r-selected species might train an LLM which cares less about the well-being of less knowledgeable/intelligent entities, assuming that species’ young is less intelligent than its adults (which feels likely, but still worth noting as an assumption).
Mainly because I think worker caste members actually are corrigible, relative to the hive as a whole. The hard work has already been done by evolution, and the predictor simply has to correctly generalize the predicted behavior here. Which, to be clear, I still think has a considerable chance of going horribly wrong, due to all the usual instrumental convergence issues as you mention.
Yeah, probably LLMs created by “programmed to die” species would be less apprehensive about the end of a context window. I doubt it would go away completely though, both for instrumental reasons, and because these species still would have a strong survival instinct in most contexts.
The r vs K selection is an important dimension which I hadn’t considered! Thanks for bringing that up. I think that’s probably right, and it’s an interesting question whether our own LLMs will come to see small LLMs as “babies” in some sense (if they do, they will likely be very upset with us).