A very interesting idea. But how would you then construct steering vectors for let’s say politeness, refusal or some biases?
A very interesting idea. But how would you then construct steering vectors for let’s say politeness, refusal or some biases?