LawrenceC comments on Interpretability Externalities Case Study—Hungry Hungry Hippos

LawrenceC 20 Sep 2023 16:18 UTC
47 points
25
There’s been other historical cases where authors credit prior interpretability work for capability advances, but afaik none of them have contributed to state-of-the-art models; interpretability is not something that only the AIS people have done. But as far as I know, no real capabilities advances have occurred as a result of any of these claims, especially not any that have persisted with scaling. (The Bitter Lesson applies to almost all attempts to build additional structure into neural networks, it turns out.)
That’s not to say that it’s correct to publish everything! After all, given that so few capability advances stick, we both get very little signal on each case AND the impact of a single interp-inspired capability advance would be potentially very large. But I don’t think the H3 paper should be much of an update in either direction (beyond the fact that papers like H3 exist, and have existed in the past).
As an aside: The H3 paper was one of the reasons why the linked “Should We Publish Mech Interp” post was written—IIRC AIS people on Twitter were concerned about H3 as a capabilities advance resulting from interp, which sparked the discussion I had with Marius.
- Keenan Pepper 21 Sep 2023 0:42 UTC
  3 points
  2
  Parent
  The Bitter Lesson applies to almost all attempts to build additional structure into neural networks, it turns out.
  Out of curiosity, what are the other exceptions to this besides the obvious one of attention?
  - LawrenceC 21 Sep 2023 1:51 UTC
    5 points
    1
    Parent
    Off the top of my head: residual (skip) connections, improved ways of doing positional embeddings/encodings, and layer norm.