They’re already trying (look up ChaosGPT, though that’s mostly a joke). But my question is more about what changes from misalignment problems in gradient descent. For example, is it easier or harder for the simulacrum to align its own copy running on a more powerful underlying model?
They’re already trying (look up ChaosGPT, though that’s mostly a joke). But my question is more about what changes from misalignment problems in gradient descent. For example, is it easier or harder for the simulacrum to align its own copy running on a more powerful underlying model?