Evan R. Murphy comments on AMA Conjecture, A New Alignment Startup

Evan R. Murphy 10 Apr 2022 7:41 UTC
1 point
0
AF
Your website says:
We want to build tools and frameworks to make interpretability with neural nets more accessible, and to help reframe conceptual problems in concrete terms.
Will you make your tools and frameworks open source so that, in addition to helping advance the work of your own researchers, they can help independent interpretability researchers and those working in other groups as well?
- Connor Leahy 10 Apr 2022 16:41 UTC
  4 points
  0
  Parent
  Probably. It is likely that we will publish a lot of our interpretability work and tools, but we can’t commit to that because, unlike some others, we think it’s almost guaranteed that some interpretability work will lead to very infohazardous outcomes. For example, obvious ways in which architectures could be trained more efficiently, and as such we need to consider each result on a case by case basis. However, if we deem them safe, we would definitely like to share as many of our tools and insights as possible.