Richard_Ngo comments on Against Almost Every Theory of Impact of Interpretability

Richard_Ngo 27 Sep 2023 16:52 UTC
8 points
−5
It does seem like a large proportion of disagreements in this space can be explained by how hard people think alignment will be. It seems like your view is actually more pessimistic about the difficulty of alignment than Eliezer’s, because he at least thinks it’s possible for mechinterp to help in principle.
I think that being confident in this level of pessimism is wildly miscalibrated, and such a big disagreement that it’s probably not worth discussing much further. Though I reply indirectly to your point here.
- Remmelt 4 Oct 2023 13:26 UTC
  1 point
  0
  Parent
  I personally think pessimistic vs. optimistic misframes it, because it frames a question about the world in terms of personal predispositions.
  
  I would like to see reasoning.
  
  Your reasoning in the comment thread you linked to is: “history is full of cases where people dramatically underestimated the growth of scientific knowledge, and its ability to solve big problems”
  
  That’s a broad reference-class analogy to use. I think it holds little to no weight as to whether there would be sufficient progress on the specific problem of “AGI” staying safe over the long-term.
  
  I wrote why that specifically would not be a solvable problem.