NickH comments on Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs

NickH 11 Mar 2025 7:48 UTC
1 point
−2
Isn’t this just an obvious consequence of the well known fact about LLMs that the more you constrain some subset of the variables the more you force the remaining ones to ever more extreme values?
- Owain_Evans 12 Mar 2025 1:21 UTC
  2 points
  0
  Parent
  I don’t think this explains the difference between the insecure model and the control models (secure and educational secure).