Sodium comments on Inoculation prompting: Instructing models to misbehave at train-time can improve run-time behavior

Sodium 22 Oct 2025 3:37 UTC
1 point
0
Oh man. The Witchers et al. math/syncophancy experiments were conducted on the original Gemma 2B it, a model from a year and a half ago. I think it would’ve made things a good bit more convincing if the experiments were done on Gemma 3 (and preferably on a bigger model/harder task)
- Fabien Roger 22 Oct 2025 12:38 UTC
  2 points
  0
  Parent
  IIRC the main requirement we had was that the model should get better at math with more training on the math task, which is the case for Gemma 2B, but isn’t the case for e.g. Qwen 2.5 models which are already trained on tons of math and are somewhat well elicited by default.