Is Lindsey using a nuanced definition of “”concept injection?
I am a non-specialist, just trying to follow and understand. I have to look up many [most] definitions and terms. This may ne a trivial matter, but for me to understand, definitions matter.
When I look up a meaning of “application steering” I find something more permanent. Has any discussion focused on Lindsey’ use of the term concept injection as an application of activation steering: “We refer to this technique as concept injection—an application of activation steering”
To me the term suggests something in training that persists in the model, when in reality it’s a per-query intervention. [Claude tells me that] another term would be: “inference-time modifications” that don’t permanently alter the model—they’re applied during each forward pass where the effect is desired.
If Activation Steering refers to ‘modifying activations during a forward pass to control model behavior’ and Concept Injection (Lindsey’s usage) is a specific application where they “inject activation patterns associated with specific concepts directly into a model’s activations” to test introspection, then isn’t this much more like transient inference-time modifications that don’t permanently alter the model.
[again, I am not a specialist ;-)]
First, thank you for your work and this post. I am not a specialist, just interested, but confused. I don’t get the significance of the results, but appreciate the thought and effort you put into this project.
I am pushing back on the ’romantic framing’ that that LLMs are “blind models” that somehow develop an some degree of internal spatial understanding of Earth through pure reasoning or emergent intelligence.
In this case didn’t the author in effect say to the model “given this list of numbers - which happen to be latitude and longitude pairs—access your core intelligence (learned parameters / weights / internal representations) and decide if it would represent land or water?
So, how big a leap would it be for the model to “think” hmm… latitude and longitude pairs—sounds like a map. Maybe I should look it up in textual map data I have been trained on?
Surely there must have been many in the models training? Surely there would be map copious amounts of text that covers land and water masses.
So, given the vast bulk of text data that the model was trained on, would that not have included many forms of public access text tables of long. lat. coordinates—like public access GeoNames, Natural Earth Data, OpenStreetMap, Global Self-consistent, Hierarchical, High-resolution Geography Database (GSHHG), an NASA/USGS Satellite Data?
Perhaps “discovering how a blind model sees Earth” overstates: “visualizing which geographic patterns persisted from text training data that included extensive coordinate databases.”
How do I get to understanding that the resulting elliptical blobs of land relate to internalized concepts of geographical distance, and some kind of as a natural abstraction helps in identification of continents?
Cheers.
Leo