Rauno Arike comments on Vestigial reasoning in RL

Rauno Arike 14 Apr 2025 10:43 UTC
2 points
0
Daniel’s argument against a length penalty is from this doc:
We want our models to learn to blather and babble freely, rather than thinking carefully about how to choose their words. Because if instead they are routinely thinking carefully about how to choose their words, that cognition might end up executing strategies like “use word X instead of Y, since thatʼll avoid suspicion.ˮ So, letʼs try to avoid incentivizing brevity.
There’s also a comment by Lukas Finnveden that argues in favor of a length penalty:
downside: more words gives more opportunity for steganography. You can have a much lower bit-rate-per-word and still accomplish the same tasks.