Realmbird

Karma: 74

Realmbird 2 Jun 2026 1:32 UTC
1 point
0
in reply to: ovindu-a’s comment on: NLA Thought Anchors
For this experiment with NLAs for 7B level, I used 2 3090s
1 for the AV SG Lang and the otherr for (generating responses and AR)
Together with 40 rollouts per prompt in GSM8k, the total time was around 24 hours
It should be way faster if you did it for 7B on an A100.

Realmbird 1 Jun 2026 0:17 UTC
1 point
0
in reply to: Realmbird’s comment on: Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
For Qwen2.5-7B-Instruct’s NLAs I found evidence that NLA answer appearing in AV increases as the token approaches the model’s final answer.

Realmbird 30 May 2026 18:36 UTC
1 point
0
in reply to: ryan_greenblatt’s comment on: Natural Language Autoencoders Produce Unsupervised Explanations of LLM Activations
Token position like on final answer token vs border token. AV on final answer token shows final answer in AV at a higher rate, for the results for 27B which token?

Realmbird 4 May 2026 15:14 UTC
1 point
0
in reply to: joseph_c’s comment on: MHC Interp #1: Previous-Token Heads Become Attention Sinks Under Manifold-Constrained Hyper-Connections
Cool idea

Realmbird 18 Apr 2026 18:11 UTC
1 point
0
on: Can we interpret latent reasoning using current mechanistic interpretability tools?
With how CoDI throws away the hidden state and only uses the kv values on the <|eocot|> token the accuracy drop after latent 5 could just be kv values can’t store more info.

Realmbird 18 Aug 2025 21:10 UTC
1 point
0
in reply to: Paul Bogdan’s comment on: Thought Anchors: Which LLM Reasoning Steps Matter?
How did you learn that vertical attention corresponded to sentences?