[Question] Transformer Mech Interp: Any visualizations?

Joyee Chen18 Jan 2023 4:32 UTC

3 points

After getting to the the part of a demo (one of Neel Nanda’s interp demos) where they talk about the idea of a Logit Lens, and Layer Attribution, I have a bit of trouble visualizing it as I could have for simpler concepts (e.g. residual streams, which were indeed drawn as my primary method of comprehension). Anybody have good resources for illustrations? (I know Nanda had a great colorful runthrough but it was only at a high level and for encoder-decoder machines, and thus not generalizable)

Joyee Chen18 Jan 2023 4:32 UTC

3 points

0 comments1 min readLW link

AI Interpretability (ML & AI)

No answers.

No comments.