A Walkthrough of A Mathematical Framework for Transformer Circuits

Neel Nanda25 Oct 2022 20:24 UTC

LW: 52 AF: 15

AI Interpretability (ML & AI)Illusion of Transparency

A Mathematical Framework for Transformer Circuits is, in my opinion, the coolest paper I’ve ever had the privilege of working on. But it’s also very long and dense and at times confusing, and this makes me sad! So I’ve run an experiment, where I recorded myself reading through the paper and narrated a stream of conscious as I go—which bits are particularly cool but under-appreciated, which bits are a bit of a waste of time, which bits do I think do or do not replicate, attempting to explain the parts I think are particularly confusing, etc. You can watch it here. Sadly, it turns out I have a lot of things to say about Transformer Circuits and this turned into a 3 hour monologue, but I hope it’s still useful! This is an experimental format for me for good research communication, and I’d love to hear feedback on how well it works for you! This was much easier to make than writing an entire paper, but could easily be a total waste of time if it’s not clear enough to be useful!

Disclaimer: The views in this video are entirely my personal takes—the paper was a team effort from everyone at Anthropic, especially Chris Olah, Nelson Elhage and Catherine Olsson, and I am no longer employed by Anthropic. I do not necessarily expect that any of the other authors would agree with any specific thing that I’ve said, but hope an unfiltered series of takes is useful!

What links here?

Neel Nanda25 Oct 2022 20:24 UTC

LW: 52 AF: 15

7 comments1 min readLW link

AI Interpretability (ML & AI)Illusion of Transparency

johnswentworth 25 Oct 2022 21:24 UTC
LW: 18 AF: 4
31
AF
Upvoted for content format—I would like to see more people do walkthroughs with their takes on a paper (especially their own), talking about what’s under-appreciated, a waste of time, replication expectations, etc.
- Neel Nanda 26 Oct 2022 11:29 UTC
  LW: 4 AF: 3
  0
  AF Parent
  Thanks! I’ve been pretty satisfied by just how easy this was—one-shot recording, no prep, something I can do in the evenings when I’m otherwise pretty low energy. Yet making a product that seems good enough to be useful to people (even if it could be much better with more effort).
  
  I’m currently doing ones for the toy model paper and induction heads paper, and experimenting with recording myself while I do research.
  
  I’d love to see other people doing this kind of thing!
evand 22 May 2023 0:55 UTC
4 points
0
This was fantastic; thank you! I still haven’t quite figured it out, I’ll definitely have to watch it a second time (or at least some parts of it).
I think some sort of improved interface for your math annotations and diagrams would be a big benefit, whether that’s a drawing tablet or typing out some LaTeX or something else.
I think the section on induction heads and how they work could have used a bit more depth. Maybe a couple more examples, maybe some additional demos of how to play around with PySvelte, maybe something else. That’s the section I had the most trouble following.
You mentioned a couple additional papers in the video; having links in the description would be handy. I suspect I can find them easily enough as it is, though.
- Neel Nanda 22 May 2023 8:29 UTC
  4 points
  0
  Parent
  I appreciate the feedback! I have since bought a graphics tablet :) If you want to explore induction heads more, you may enjoy this tutorial
  
  Any papers you’re struggling to find?
ojorgensen 26 Oct 2022 7:11 UTC
LW: 3 AF: 2
0
AF
I went through the paper for a reading group the other day, and I think the video really helped me to understand what is going on in the paper. Parts I found most useful were indications which parts of the paper / maths were most important to be able to understand, and which were not (tensor products).

I had made some effort to read the paper before with little success, but now feel like I understand the overall results of the paper pretty well. I’m very positive about this video, and similar things like this being made in the future!

Personal context: I also found the intro to IB video series similarly useful. I’m an AI masters student who has some pre-existing knowledge about AI alignment. I have a maths background.
- Neel Nanda 26 Oct 2022 11:30 UTC
  LW: 3 AF: 2
  0
  AF Parent
  Thanks for the feedback! Glad to hear it was useful :)
  
  intro to IB video series What do you mean by this?
  - ojorgensen 26 Oct 2022 13:46 UTC
    1 point
    0
    Parent
    Understanding Infra-Bayesianism :))