So just to be clear about your proposal, you are suggesting that an LLM is internally represented as a mixture-of-experts model, where distinct non-overlapping subnetworks represent its behavior in each of the distinct roles it can take on?
I don’t want to claim any particular model, but I think I’m trying to say “if the simulacra model is true, it doesn’t follow immediately that there is a privileged ‘true’ simulacra in the simulator (e.g. a unique stable ‘state’ which cannot be transitioned into other ‘states’)”. If I had to guess, to the extent we can say the simulator is has discrete “states” or “masks” like “helpful assistant” or “paperclip maximizer”, there are some states that are relatively stable (e.g. there are very few strings of tokens that can transition the waluigi back to the luigi), but there isn’t a unique stable state which we can call the “true” intelligence inside the LLM.
If I understand the simulacra model correctly, it is saying that the LLM is somewhat like what you describe, with three four significant changes:
The mixture is over “all processes that could produce this text”, not just “experts”.
The processes are so numerous that they are more like a continuous distribution than a discrete one.
The subnetworks are very much not non-overlapping, and in fact are hopelessly superimposed.
[Edit: adding this one] The LLM is only approximating the mixture, it doesn’t capture it perfectly.
If I had to describe a single model of how LLMs work right now, I’d probably go with the simulacra model, but I am not 100% confident in it.
I don’t think the simulacra model says that there is a “true” simulacra/mask. As you point out that is kind of incoherent, and I think the original simulator/mask distinction does a good job of making it clear that there is a distinction.
However, I specifically ask the question about MOE because I think the different identities of LLMs are not an MOE. Instead I think that there are some basic strategies for generating text that are useful for many different simulated identities, and I think the simulator model suggests thinking of the true LLM as being this amalgamation of strategies.
This is especially so with the shoggoth image. The idea of using an eldritch abomination to illustrate the simulator is to suggest that the simulator is not at all of the same kind as the identities it is simulating, nor similar to any kind of mind that humans are familiar with.
What if we made an analogy to a hydra, one where the body could do a fair amount of thinking itself, but where the heads (previously the “masks”) control access to the body (previous the “shoggoth”)? In this analogy you’re saying the LLM puts its reused strategies in the body of the hydra and the heads outsource those strategies to the body? I think I’d agree with that.
In this analogy, my point in this post is that you can’t talk directly to the hydra’s body, just as you can’t talk to a human’s lizard brain. At best you can have a head role-play as the body.
So just to be clear about your proposal, you are suggesting that an LLM is internally represented as a mixture-of-experts model, where distinct non-overlapping subnetworks represent its behavior in each of the distinct roles it can take on?
I don’t want to claim any particular model, but I think I’m trying to say “if the simulacra model is true, it doesn’t follow immediately that there is a privileged ‘true’ simulacra in the simulator (e.g. a unique stable ‘state’ which cannot be transitioned into other ‘states’)”. If I had to guess, to the extent we can say the simulator is has discrete “states” or “masks” like “helpful assistant” or “paperclip maximizer”, there are some states that are relatively stable (e.g. there are very few strings of tokens that can transition the waluigi back to the luigi), but there isn’t a unique stable state which we can call the “true” intelligence inside the LLM.
If I understand the simulacra model correctly, it is saying that the LLM is somewhat like what you describe, with
threefour significant changes:The mixture is over “all processes that could produce this text”, not just “experts”.
The processes are so numerous that they are more like a continuous distribution than a discrete one.
The subnetworks are very much not non-overlapping, and in fact are hopelessly superimposed.
[Edit: adding this one] The LLM is only approximating the mixture, it doesn’t capture it perfectly.
If I had to describe a single model of how LLMs work right now, I’d probably go with the simulacra model, but I am not 100% confident in it.
I don’t think the simulacra model says that there is a “true” simulacra/mask. As you point out that is kind of incoherent, and I think the original simulator/mask distinction does a good job of making it clear that there is a distinction.
However, I specifically ask the question about MOE because I think the different identities of LLMs are not an MOE. Instead I think that there are some basic strategies for generating text that are useful for many different simulated identities, and I think the simulator model suggests thinking of the true LLM as being this amalgamation of strategies.
This is especially so with the shoggoth image. The idea of using an eldritch abomination to illustrate the simulator is to suggest that the simulator is not at all of the same kind as the identities it is simulating, nor similar to any kind of mind that humans are familiar with.
What if we made an analogy to a hydra, one where the body could do a fair amount of thinking itself, but where the heads (previously the “masks”) control access to the body (previous the “shoggoth”)? In this analogy you’re saying the LLM puts its reused strategies in the body of the hydra and the heads outsource those strategies to the body? I think I’d agree with that.
In this analogy, my point in this post is that you can’t talk directly to the hydra’s body, just as you can’t talk to a human’s lizard brain. At best you can have a head role-play as the body.
Could work.