Metin Hasan comments on Unsupervised Activation Steering: Find a steering vector that best represents any set of text data

Metin Hasan 11 Jun 2025 10:42 UTC
1 point
0
A very interesting idea. But how would you then construct steering vectors for let’s say politeness, refusal or some biases?
- Danielle Ensign 20 Jun 2025 19:43 UTC
  1 point
  0
  Parent
  I think for those cases you’re better off using standard methods (multiple choice etc.), this technique is only useful when paired positive negative data is more difficult to create (like writing imitation).