That is true, but the entire point of Gemma is to be a testbed for AI research, which would include steering. If Google did this deliberately and didn’t say so, that would be quite bad on their part. I also don’t think it’s particularly likely.
If they’re doing it by mistake as part of normal safety training, then I hope they figure that out before steering becomes load-bearing for Gemini’s safety.
Most of my money is still on “steering to produce a specific false fact is particularly difficult, compared to other steering challenges” explaining the absolute difficulty I had, possibly with a side of “I’m not very good at steering”. It’s the relative difficulty of steering the Gemma models that actually worries me.
That is true, but the entire point of Gemma is to be a testbed for AI research, which would include steering. If Google did this deliberately and didn’t say so, that would be quite bad on their part. I also don’t think it’s particularly likely.
If they’re doing it by mistake as part of normal safety training, then I hope they figure that out before steering becomes load-bearing for Gemini’s safety.
Most of my money is still on “steering to produce a specific false fact is particularly difficult, compared to other steering challenges” explaining the absolute difficulty I had, possibly with a side of “I’m not very good at steering”. It’s the relative difficulty of steering the Gemma models that actually worries me.