Recently, the focus of mechanistic interpretability work has shifted to thinking about “representations”, rather than strictly about entire algorithms
Recently? From what I can tell, this seems to have been a focus from the early days (1, 2).
That said, great post! I really appreciated your conceptual frames.
Recently? From what I can tell, this seems to have been a focus from the early days (1, 2).
That said, great post! I really appreciated your conceptual frames.