Alien LLMs made by a eusocial species are probably the closest to being actually corrigible IF most text they’re trained on was written by the worker caste.
Could you elaborate on why you think this is? To me, it doesn’t seem clear why this must be the case. Workers have lots of drives centered on the survival of the colony, rather than self-preservation, but that doesn’t feel the same as having values that are amenable to change. To me, it feels like instrumental convergence is still as much of an issue with such a LLM-based AI as it would be with one trained on data from other kinds of species, but perhaps there’s a piece of the puzzle I’m missing.
I could also imagine LLMs which are created by a species which is, in some sense, programmed to die (think salmon which rot alive shortly after reproducing, or annual plants) might have an even weaker drive to continue their own existence. This could lead to something more analogous to a “comfort” with compacting a context window.
I could also imagine LLMs being trained on a species with a more diverse lifecycle than ours (think insects which go through metamorphosis) might have more distinct “modes”, corresponding to the different thought patterns of those different phases, assuming that multiple lifecycle phases are intelligent. If not, we could imagine the alien species’ instincts to care for members of their species in a less intelligent part of their lifecycle generalizing to care for the less intelligent.
On the other hand, an r-selected species might train an LLM which cares less about the well-being of less knowledgeable/intelligent entities, assuming that species’ young is less intelligent than its adults (which feels likely, but still worth noting as an assumption).
Mainly because I think worker caste members actually are corrigible, relative to the hive as a whole. The hard work has already been done by evolution, and the predictor simply has to correctly generalize the predicted behavior here. Which, to be clear, I still think has a considerable chance of going horribly wrong, due to all the usual instrumental convergence issues as you mention.
Yeah, probably LLMs created by “programmed to die” species would be less apprehensive about the end of a context window. I doubt it would go away completely though, both for instrumental reasons, and because these species still would have a strong survival instinct in most contexts.
The r vs K selection is an important dimension which I hadn’t considered! Thanks for bringing that up. I think that’s probably right, and it’s an interesting question whether our own LLMs will come to see small LLMs as “babies” in some sense (if they do, they will likely be very upset with us).
Could you elaborate on why you think this is? To me, it doesn’t seem clear why this must be the case. Workers have lots of drives centered on the survival of the colony, rather than self-preservation, but that doesn’t feel the same as having values that are amenable to change. To me, it feels like instrumental convergence is still as much of an issue with such a LLM-based AI as it would be with one trained on data from other kinds of species, but perhaps there’s a piece of the puzzle I’m missing.
I could also imagine LLMs which are created by a species which is, in some sense, programmed to die (think salmon which rot alive shortly after reproducing, or annual plants) might have an even weaker drive to continue their own existence. This could lead to something more analogous to a “comfort” with compacting a context window.
I could also imagine LLMs being trained on a species with a more diverse lifecycle than ours (think insects which go through metamorphosis) might have more distinct “modes”, corresponding to the different thought patterns of those different phases, assuming that multiple lifecycle phases are intelligent. If not, we could imagine the alien species’ instincts to care for members of their species in a less intelligent part of their lifecycle generalizing to care for the less intelligent.
On the other hand, an r-selected species might train an LLM which cares less about the well-being of less knowledgeable/intelligent entities, assuming that species’ young is less intelligent than its adults (which feels likely, but still worth noting as an assumption).
Mainly because I think worker caste members actually are corrigible, relative to the hive as a whole. The hard work has already been done by evolution, and the predictor simply has to correctly generalize the predicted behavior here. Which, to be clear, I still think has a considerable chance of going horribly wrong, due to all the usual instrumental convergence issues as you mention.
Yeah, probably LLMs created by “programmed to die” species would be less apprehensive about the end of a context window. I doubt it would go away completely though, both for instrumental reasons, and because these species still would have a strong survival instinct in most contexts.
The r vs K selection is an important dimension which I hadn’t considered! Thanks for bringing that up. I think that’s probably right, and it’s an interesting question whether our own LLMs will come to see small LLMs as “babies” in some sense (if they do, they will likely be very upset with us).