Shankar Sivarajan comments on Steering Might Stop Working Soon

Shankar Sivarajan 6 Apr 2026 2:58 UTC
3 points
−7
This seems to also be consistent with the companies that release open models (here, Google and Gemma) doing something to them that makes this simple steering not work, for Safety reasons, while the larger and more capable internal model can be steered just fine.
- J Bostock 6 Apr 2026 8:05 UTC
  4 points
  1
  Parent
  That is true, but the entire point of Gemma is to be a testbed for AI research, which would include steering. If Google did this deliberately and didn’t say so, that would be quite bad on their part. I also don’t think it’s particularly likely.
  If they’re doing it by mistake as part of normal safety training, then I hope they figure that out before steering becomes load-bearing for Gemini’s safety.
  Most of my money is still on “steering to produce a specific false fact is particularly difficult, compared to other steering challenges” explaining the absolute difficulty I had, possibly with a side of “I’m not very good at steering”. It’s the relative difficulty of steering the Gemma models that actually worries me.