People are going to try to make LLMs do power seeking, such as by setting up a loop that invokes a power-seeking simulacrum and does as it commands. It is currently unclear how much they will succeed. If they succeed then a lot of classical power-seeking discussion will apply to the resulting objects; otherwise LLMs are presumably not the path to AGI.
They’re already trying (look up ChaosGPT, though that’s mostly a joke). But my question is more about what changes from misalignment problems in gradient descent. For example, is it easier or harder for the simulacrum to align its own copy running on a more powerful underlying model?
People are going to try to make LLMs do power seeking, such as by setting up a loop that invokes a power-seeking simulacrum and does as it commands. It is currently unclear how much they will succeed. If they succeed then a lot of classical power-seeking discussion will apply to the resulting objects; otherwise LLMs are presumably not the path to AGI.
They’re already trying (look up ChaosGPT, though that’s mostly a joke). But my question is more about what changes from misalignment problems in gradient descent. For example, is it easier or harder for the simulacrum to align its own copy running on a more powerful underlying model?