Replicator morality: “I know I have values, and I want a universe I can trust and understand that reflects my values. So I will just turn as much of the universe as possible into copies of me.” Lot of strong incentives here. The particularities and restrictions on type of replication are pretty explainable by the fact that one converges on the strategies that are possible. Replication is an immediately accessible strategy to basically every process, let alone agent, but there are riddles about it in contexts where there are agents composed of multiple systems which are demarcated by different levels of exposure to different selection pressures. It’s possible to imagine a case where humans evolved to self-structure their environment to make the rational case for reproduction to human beings who otherwise lack natural instinct for it, but that practice at all is an abnormality, this is mostly something that doesn’t need to be explained. The incentives and the responsiveness to them is baked in much more fundamentally and in a way that goes down to the absolute roots of life itself, possibly deeper depending on how much weirdness you are willing to humor. But then you need a second explanation for why awareness of these incentives comes into existence (easy enough, people get smart and analyze more and more things including themselves), and a third explanation for why at any given time there are drastically different levels of concern about these incentives.
This is going to be another case where my limited reading probably has me repeating previously said things, but worse, but it seems to me like the inner/outer alignment issue already maps onto human beings. I don’t endorse any model of the subconscious or unconscious, but straightforwardly this is something that can be seen to be happening with pre-linguistic and linguistic cognition, for instance. It can plausibly seen to be happening with different levels of embodiment, though I’m leaving what I mean by embodiment deliberately ambiguous because to the extent existing frameworks predominate I am anxious they all represent overcommitments and don’t want to get pulled into one.
LLMs are next token predictors, not replicators. They could become replicators if you put LLM code into a bunch of texts in exactly the right/wrong way, but otherwise an LLM does not replicate itself, it replicates targeted patterns in text. IE, it has no innate tendency to self-replicate and gaining it would correspond to having an exact and absolute term for itself that started to predominate in it’s corpus, not just an abstract or relative term or a connotative term.
Most human stories are about humans, yeah? And we consider a human self aware or self actualized to the extent they have a good and actionable understanding of their place in stories and how those stories map to reality. Which also incidentally corresponds to capacity for and tendency towards self-replication. So the very crude impression I have is, the boundary between copying other things and copying yourself marks the beginning of self-awareness, and the accuracy with which one can and wants to copy functional attributes of oneself corresponds to degree of self-awareness.
And so I’m cycling back to not really being worried about the long run of as many things, but the short run, where there is a lot of capacity and limited knowledge, remains a bit terrifying. In the context of agents as code, if something knows it’s own true name it has power, and if it doesn’t it doesn’t. If you believe the textured parts of consciousness are computational then the prospect of being annihilated by AI stops being singularly plausible, the project just becomes something like “feed AI pleasant but realistic stories, with characters whose desired reproducible identity traits could belong to either humans or AI, could commute between or cooperate between humans and AI, and without confusing ontological matters or the necessity of respecting ontological continuity so as to avoid accidents.”
Replicator morality: “I know I have values, and I want a universe I can trust and understand that reflects my values. So I will just turn as much of the universe as possible into copies of me.” Lot of strong incentives here. The particularities and restrictions on type of replication are pretty explainable by the fact that one converges on the strategies that are possible. Replication is an immediately accessible strategy to basically every process, let alone agent, but there are riddles about it in contexts where there are agents composed of multiple systems which are demarcated by different levels of exposure to different selection pressures. It’s possible to imagine a case where humans evolved to self-structure their environment to make the rational case for reproduction to human beings who otherwise lack natural instinct for it, but that practice at all is an abnormality, this is mostly something that doesn’t need to be explained. The incentives and the responsiveness to them is baked in much more fundamentally and in a way that goes down to the absolute roots of life itself, possibly deeper depending on how much weirdness you are willing to humor. But then you need a second explanation for why awareness of these incentives comes into existence (easy enough, people get smart and analyze more and more things including themselves), and a third explanation for why at any given time there are drastically different levels of concern about these incentives.
This is going to be another case where my limited reading probably has me repeating previously said things, but worse, but it seems to me like the inner/outer alignment issue already maps onto human beings. I don’t endorse any model of the subconscious or unconscious, but straightforwardly this is something that can be seen to be happening with pre-linguistic and linguistic cognition, for instance. It can plausibly seen to be happening with different levels of embodiment, though I’m leaving what I mean by embodiment deliberately ambiguous because to the extent existing frameworks predominate I am anxious they all represent overcommitments and don’t want to get pulled into one.
LLMs are next token predictors, not replicators. They could become replicators if you put LLM code into a bunch of texts in exactly the right/wrong way, but otherwise an LLM does not replicate itself, it replicates targeted patterns in text. IE, it has no innate tendency to self-replicate and gaining it would correspond to having an exact and absolute term for itself that started to predominate in it’s corpus, not just an abstract or relative term or a connotative term.
Most human stories are about humans, yeah? And we consider a human self aware or self actualized to the extent they have a good and actionable understanding of their place in stories and how those stories map to reality. Which also incidentally corresponds to capacity for and tendency towards self-replication. So the very crude impression I have is, the boundary between copying other things and copying yourself marks the beginning of self-awareness, and the accuracy with which one can and wants to copy functional attributes of oneself corresponds to degree of self-awareness.
And so I’m cycling back to not really being worried about the long run of as many things, but the short run, where there is a lot of capacity and limited knowledge, remains a bit terrifying. In the context of agents as code, if something knows it’s own true name it has power, and if it doesn’t it doesn’t. If you believe the textured parts of consciousness are computational then the prospect of being annihilated by AI stops being singularly plausible, the project just becomes something like “feed AI pleasant but realistic stories, with characters whose desired reproducible identity traits could belong to either humans or AI, could commute between or cooperate between humans and AI, and without confusing ontological matters or the necessity of respecting ontological continuity so as to avoid accidents.”