That proposal seems like it’d mitigate the problem somewhat, but even then I think there are nonobvious paths to end up with significant pressure on the CoT. For example, I frequently will ask a question, then watch the CoT summaries to see how the model is approaching the problem, and stop the response and edit my prompt and retry if it’s going down a path I don’t like. If someone has the bright idea to penalize trajectories which resulted in the user cancelling the request partway through, you’ve now got pressure on the CoT.
I think there are a bunch of weird side channels like this, to the point where I’m not sure that companies know how to pay the associated safety tax even if they want to.
I wouldn’t classify that as a weird side channel btw, that was in fact exactly one of the cases I had in mind back in ’23 when I was going around telling everyone about the importance of not training on the CoT.
I agree that the companies are currently incompetent at paying the associated safety tax, as evidenced recently by Mythos system card lmao.
However I think it would be great if they got better at it and committed to paying the tax.
That proposal seems like it’d mitigate the problem somewhat, but even then I think there are nonobvious paths to end up with significant pressure on the CoT. For example, I frequently will ask a question, then watch the CoT summaries to see how the model is approaching the problem, and stop the response and edit my prompt and retry if it’s going down a path I don’t like. If someone has the bright idea to penalize trajectories which resulted in the user cancelling the request partway through, you’ve now got pressure on the CoT.
I think there are a bunch of weird side channels like this, to the point where I’m not sure that companies know how to pay the associated safety tax even if they want to.
I wouldn’t classify that as a weird side channel btw, that was in fact exactly one of the cases I had in mind back in ’23 when I was going around telling everyone about the importance of not training on the CoT.
I agree that the companies are currently incompetent at paying the associated safety tax, as evidenced recently by Mythos system card lmao.
However I think it would be great if they got better at it and committed to paying the tax.