J Bostock comments on How will we do SFT on models with opaque reasoning?

J Bostock 22 Feb 2026 2:54 UTC
2 points
1
The technique you describe here seems like it’s very vulnerable to the decoder model colluding with the policy
Yes, I was claiming that this was likely, not that it was desirable.