Ah yeah I was using “anomaly tiling” as shorthand for the description in the paper of upstream features that detect anomalies along preferred directions and collectively tile the space of possible anomalies. And by “upstream carrier population,” I meant that set of upstream weak evidence-carrier features before the gates. So I’m sorry for the confusion, I was trying to compress a lot.
As for why I punctuation sensitivity matters, for me it caught my attention because the outputs from a single punctuation change gave significantly different responses and I found that interesting. I think it’s clearer though if I connect this to the paper by Anthropic “When Models Manipulate Manifolds: The Geometry of a Counting Task” The paper gives a concrete example of a model representing a task-relevant variable geometrically, with sparse feature families locally tiling a manifold and attention heads transforming that geometry to make a decision.
So I don’t really think it’s an issue, I think it’s interesting because it may be a tiny external cue that moves the model into a different region of this internal geometry. And if this shift changes which upstream features activate then that could help reveal how the model routes self-state/anomaly information before gate features produce an answer.
Okay yeah I agree punctuation matters semantically and that some model behavior is expected. The reason I don’t think it’s strictly that is, in at least one of my project’s runs, the visible response at turn index 10 did not change at all, while the internal attention/geometry deltas still spiked. The behavioral difference only surfaced at turn index 11. So this would mean surface level equivalent completions can still carry state divergence forward.
This is why I am comparing it to “When Models Manipulate Manifolds” paper, its not that question marks or ellipsis are exactly like newlines in content, its because both are structural markers that can trigger internal state updates. This particular paper gives a mechanistic example where heads route boundary information through geometry.
And how it connects to “Mechanisms of Introspective Awareness” is that their evidence carriers/gates show how weak distributed internal signals can be present before or apart from straightforward output behavior.
So it makes me think of these heads being possible candidate upstream routers rather than the final detector. Their detection mechanism is mostly localized to distributed mid-to-late MLP computation, with weak evidence carriers and gate features. If punctuation and discourse transitions consistently recruit the same mid-to-late attention families, those heads may function as stable upstream routers for structural state information. Their outputs could then feed into later MLP features that integrate weak signals into reportable evidence.