My current workflow to study the internal mechanisms of LLM

This is a post to keep track my research workflow of studying LLM. Since I am doing it on my spare time, I want to keep my pipeline as simple as possible.

Step 1: Formulate a question for investigating model’s behavior .

Step 2: Find the influential layer for the behavior

  • Output across layers

https://​​github.com/​​jalammar/​​ecco

  • Activation patching (Rome)

Notebook examples:

https://​​colab.research.google.com/​​drive/​​1uFui2i40eU0G9kvbCNTFMgXFHSB7lL9i

https://​​colab.research.google.com/​​github/​​UFO-101/​​an-neuron/​​blob/​​main/​​an_neuron_investigation.ipynb

Step 3: Locate the influential neuron

Step 4: Visualize the neuron activation

  • Interactive Neuroscope

https://​​colab.research.google.com/​​github/​​neelnanda-io/​​TransformerLens/​​blob/​​main/​​demos/​​Interactive_Neuroscope.ipynb#scrollTo=Aa74dGVpF8lD

Reference:

We Found An Neuron in GPT-2
Interfaces for Explaining Transformer Language Models

200 COP in MI: Studying Learned Features in Language Models

No comments.