Jakub Halmeš comments on Jesse Hoogland’s Shortform

Jakub Halmeš 23 Jan 2025 16:00 UTC
4 points
0
During the training process, we observe that CoT often exhibits language mixing, particularly when RL prompts involve multiple languages. To mitigate the issue of language mixing, we introduce a language consistency reward during RL training, which is calculated as the proportion of target language words in the CoT. Although ablation experiments show that such alignment results in a slight degradation in the model’s performance, this reward aligns with human preferences, making it more readable.
I also found this trade-off between human readability and performance noteworthy.
- Milan W 23 Jan 2025 16:48 UTC
  3 points
  0
  Parent
  Side note: Claude 3.5 Sonnet does CoT language-mixing after a bit of prompting and convincing. I’m not sure about effects on performance. Also the closeness narratively implied by having it imitate the idiosyncratic mixture I was using to talk to it probably exacerbated sycophancy.