Chris_Leong comments on A Problem to Solve Before Building a Deception Detector

Chris_Leong 11 Apr 2025 17:35 UTC
LW: 3 AF: 2
0
AF
Recently, the focus of mechanistic interpretability work has shifted to thinking about “representations”, rather than strictly about entire algorithms

Recently? From what I can tell, this seems to have been a focus from the early days (1, 2).
That said, great post! I really appreciated your conceptual frames.