Video/animation: Neel Nanda explains what mechanistic interpretability is

DanielFilan22 Feb 2023 22:42 UTC

24 points

Nice little video—audio is Neel Nanda explaining what mechanistic interpretability is and why he does it, and it’s illustrated by the illustrious Hamish Doodles. Excerpted from the AXRP episode.

(It’s not technically animation I think, but I don’t know what other single word to use for “pictures that move a bit and change”)

DanielFilan22 Feb 2023 22:42 UTC

24 points

7 comments1 min readLW link

Interpretability (ML & AI)AXRP AI

Sheikh Abdur Raheem Ali 23 Feb 2023 1:19 UTC
8 points
6
Lots of alpha in AI research distillers learning motion-canvas/motion-canvas: Visualize Complex Ideas Programmatically (github.com) and making explainers.
- adzcai 23 Feb 2023 1:36 UTC
  4 points
  1
  Parent
  Or even better, finetuning an LLM to automate writing the code!
  - the gears to ascension 23 Feb 2023 2:16 UTC
    3 points
    0
    Parent
    cyborgism, activate!
    just don’t use an overly large model.
- the gears to ascension 23 Feb 2023 2:17 UTC
  2 points
  0
  Parent
  For those reading (I imagine Sheikh knows about these already), some videos from the creator of that library:
novalinium 22 Feb 2023 23:57 UTC
3 points
2
A single word for this would be an animatic, probably.
- DanielFilan 23 Feb 2023 0:09 UTC
  2 points
  1
  Parent
  I kinda guess that most people don’t know what that means.
TinkerBird 23 Feb 2023 9:29 UTC
1 point
0
Here’s a dumb idea: if you have a misaligned AGI, can you keep it inside a box and have it teach you some things about alignment, perhaps through some creative lies?

Video/​animation: Neel Nanda explains what mechanistic interpretability is

Video/animation: Neel Nanda explains what mechanistic interpretability is