slavachalnev

Karma: 119

slavachalnev 20 May 2026 4:13 UTC
2 points
0
in reply to: Sandy Fraser’s comment on: Cycle-Consistent Activation Oracles
The heatmaps represent two separate things: on the original model’s tokens, it represents cosine similarity of the reconstruction. On the decoder’s output tokens, the heatmap shows the weight the encoder’s pooling head puts on that token’s activation.

Yeah cosine sim does throw away magnitude information. Dot product wouldn’t work because you can maximise it by predicting a very large vector, but MSE would be a reasonable choice.

Notes on Transformer Consciousness

slavachalnev29 Apr 2026 0:00 UTC

36 points

2 comments2 min readLW link

Cycle-Consistent Activation Oracles

slavachalnev12 Mar 2026 2:58 UTC

53 points

5 comments6 min readLW link

Sparse MLP Distillation

slavachalnev15 Jan 2024 19:39 UTC

34 points

3 comments6 min readLW link