I don’t get why people disagree with me and don’t try to comment. I will do it by myself then. There is one thing we do to make use of models that are misaligned with our goals—we jailbreak them—so this is what we can do with scheming models—we can jailbreak them to get useful outputs. Or you might expect that the model is useful but it’s scheming from time to time. Then you can get useful outputs. Validation is still a problem tho.
I don’t get why people disagree with me and don’t try to comment. I will do it by myself then. There is one thing we do to make use of models that are misaligned with our goals—we jailbreak them—so this is what we can do with scheming models—we can jailbreak them to get useful outputs. Or you might expect that the model is useful but it’s scheming from time to time. Then you can get useful outputs. Validation is still a problem tho.