One would hope indeed. But even so, we do now know that this is likely to be the kind of action that could be detected and opposed. And since I didn’t predict in advance that this would happen, especially at this capability level, the update for me is that it’s going to be significantly harder than I would hope.
One would hope that progress in interpretability will allow us to do much more refined “mind control” than adding difference vectors.
One would hope indeed. But even so, we do now know that this is likely to be the kind of action that could be detected and opposed. And since I didn’t predict in advance that this would happen, especially at this capability level, the update for me is that it’s going to be significantly harder than I would hope.