Logan Riggs comments on silentbob’s Shortform

Logan Riggs 19 May 2025 1:34 UTC
4 points
2
You can learn a per-token bias over all the layers to understand where in the model it stops representing the original embedding (or a linear transformation of it) like in https://www.lesswrong.com/posts/P8qLZco6Zq8LaLHe9/tokenized-saes-infusing-per-token-biases

You could also plot the cos-sims of the resulting biases to see how much it rotates.