Is Lindsey using a nuanced definition of “”concept injection? I am a non-specialist, just trying to follow and understand. I have to look up many [most] definitions and terms. This may ne a trivial matter, but for me to understand, definitions matter. When I look up a meaning of “application steering” I find something more permanent. Has any discussion focused on Lindsey’ use of the term concept injection as an application of activation steering: “We refer to this technique as concept injection—an application of activation steering”
To me the term suggests something in training that persists in the model, when in reality it’s a per-query intervention. [Claude tells me that] another term would be: “inference-time modifications” that don’t permanently alter the model—they’re applied during each forward pass where the effect is desired. If Activation Steering refers to ‘modifying activations during a forward pass to control model behavior’ and Concept Injection (Lindsey’s usage) is a specific application where they “inject activation patterns associated with specific concepts directly into a model’s activations” to test introspection, then isn’t this much more like transient inference-time modifications that don’t permanently alter the model. [again, I am not a specialist ;-)]
Is Lindsey using a nuanced definition of “”concept injection?
I am a non-specialist, just trying to follow and understand. I have to look up many [most] definitions and terms. This may ne a trivial matter, but for me to understand, definitions matter.
When I look up a meaning of “application steering” I find something more permanent. Has any discussion focused on Lindsey’ use of the term concept injection as an application of activation steering: “We refer to this technique as concept injection—an application of activation steering”
To me the term suggests something in training that persists in the model, when in reality it’s a per-query intervention. [Claude tells me that] another term would be: “inference-time modifications” that don’t permanently alter the model—they’re applied during each forward pass where the effect is desired.
If Activation Steering refers to ‘modifying activations during a forward pass to control model behavior’ and Concept Injection (Lindsey’s usage) is a specific application where they “inject activation patterns associated with specific concepts directly into a model’s activations” to test introspection, then isn’t this much more like transient inference-time modifications that don’t permanently alter the model.
[again, I am not a specialist ;-)]