simulus comments on silentbob’s Shortform

simulus 18 May 2025 20:01 UTC
11 points
2
There has actually been some work visualizing this process, with a method called the “logit lens”.
The first example that I know of: https://www.lesswrong.com/posts/AcKRB8wDpdaN6v6ru/interpreting-gpt-the-logit-lens
A more thorough analysis: https://arxiv.org/abs/2303.08112