anaguma comments on OpenAI: Detecting misbehavior in frontier reasoning models

anaguma 11 Mar 2025 19:13 UTC
5 points
3
I would guess that the reason it hasn’t devolved into full neuralese is because there is a KL divergence penalty, similar to how RHLF works.