James Payor comments on Should we publish mechanistic interpretability research?

James Payor 23 Apr 2023 0:30 UTC
1 point
0
I think the issue is that when you get more understandable base components, and someone builds an AGI out of those, you still don’t understand the AGI.

That research is surely helpful though if it’s being used to make better-understood things, rather than enabling folk to make worse-understood more-powerful things.

I think moving in the direction of “insights are shared with groups the researcher trusts” should broadly help with this.
- James Payor 23 Apr 2023 1:05 UTC
  1 point
  0
  Parent
  Hm I should also ask if you’ve seen the results of current work and think it’s evidence that we get more understandable models, moreso than we get more capable models?