Sandy Fraser comments on Cycle-Consistent Activation Oracles

Sandy Fraser 20 May 2026 1:32 UTC
1 point
0
Very nice! What does color represent in the heatmaps? Cycle loss/accuracy?

Each is scored by how well the encoder reconstructs the original activation (cosine similarity).

If the base model uses non-normalized activations, shouldn’t the dot product be your training signal? Otherwise information in the magnitude of the activations would be ignored. I wonder if that might account for some of the inaccuracy (0.8).
- slavachalnev 20 May 2026 4:13 UTC
  2 points
  0
  Parent
  The heatmaps represent two separate things: on the original model’s tokens, it represents cosine similarity of the reconstruction. On the decoder’s output tokens, the heatmap shows the weight the encoder’s pooling head puts on that token’s activation.
  
  Yeah cosine sim does throw away magnitude information. Dot product wouldn’t work because you can maximise it by predicting a very large vector, but MSE would be a reasonable choice.