Oliver Daniels comments on A Pragmatic Vision for Interpretability

Oliver Daniels 2 Dec 2025 20:21 UTC
1 point
0
hmm yeah I guess I basically agree—free form exploration is better on robustly useful settings, i.e. “let’s discover interesting things about current models” (though this exploration can still be useful for improving the realism of model organisms).

maybe I think methods work should be more more focused on model organisms then prosaic problems.

There’s also the dynamic where as capabilities improve, model organisms become more realistic and robust, but at the current margins I think its still more useful to add artificial properties rather than solving prosaic problems.
- Neel Nanda 2 Dec 2025 22:15 UTC
  2 points
  0
  Parent
  If you can get a good proxy to the eventual problem in a real model I much prefer that, on realism grounds. Eg eval awareness in Sonnet