Are you at all worried about whether Claude Mythos being accidentally trained against CoT will corrupt future Claude models? Furthermore, I don’t understand how we can get reliable CoT monitoring if it’s included in a model’s training data, otherwise won’t the issue just continue to manifest in different ways?
Are you at all worried about whether Claude Mythos being accidentally trained against CoT will corrupt future Claude models? Furthermore, I don’t understand how we can get reliable CoT monitoring if it’s included in a model’s training data, otherwise won’t the issue just continue to manifest in different ways?