Okay, you successfully nerd-sniped me into interpreting the model :)
I think I understand the role of {N1, N6, N7, N8} reasonably well. The activations post-Whh are well approximated by the linear model
Whhhn,t≈an(δ)⋅Mt+bn(δ)⋅max(St,0)
where Mt is the running max, St is the second running max, and δ represents how long ago the max-value occurred. The coefficients change with delta in pleasing patterns:
This model fits the activations well (R2=0.992).[1]
This is far from a complete explanation by your standards. In particular:
I only have a partial mechanistic understanding of how the weights lead to this behavior. I think it’s entirely feasible to understand, but will take more time to unravel.
There are large parts of the model I haven’t looked at at all, e.g. the other 10 neurons. There are also parts of the task that I don’t know how the model does, e.g. tracking the current position of the 2nd-maximum value).
I may work more on this, but probably not for a couple of days so it seemed worth posting my progress. Lots more detail on my understanding (e.g. a partial mechanistic understanding) in this notebook.
Okay, you successfully nerd-sniped me into interpreting the model :)
I think I understand the role of {N1, N6, N7, N8} reasonably well. The activations post-Whh are well approximated by the linear model
Whhhn,t≈an(δ)⋅Mt+bn(δ)⋅max(St,0)
where Mt is the running max, St is the second running max, and δ represents how long ago the max-value occurred. The coefficients change with delta in pleasing patterns:
This model fits the activations well (R2=0.992).[1]
This is far from a complete explanation by your standards. In particular:
I only have a partial mechanistic understanding of how the weights lead to this behavior. I think it’s entirely feasible to understand, but will take more time to unravel.
There are large parts of the model I haven’t looked at at all, e.g. the other 10 neurons. There are also parts of the task that I don’t know how the model does, e.g. tracking the current position of the 2nd-maximum value).
I may work more on this, but probably not for a couple of days so it seemed worth posting my progress. Lots more detail on my understanding (e.g. a partial mechanistic understanding) in this notebook.
though more like 0.95 for some subsets
Good start!