joshc comments on AI alignment researchers don’t (seem to) stack

joshc 14 Mar 2023 6:16 UTC
7 points
0
The analogy between AI safety and math or physics is assumed it in a lot of your writing and I think it is a source of major disagreement with other thinkers. ML capabilities clearly isn’t the kind of field that requires building representations over the course of decades.

I think it’s possible that AI safety requires more conceptual depth than AI capabilities; but in these worlds, I struggle to see how the current ML paradigm coincides with conceptual ‘solutions’ that can’t be found via iteration at the end. In those worlds, we are probably doomed so I’m betting on the worlds in which you are wrong and we must operate within the current empirical ML paradigm. It’s odd to me that you and Eliezer seem to think the current situation is very intractable, and yet you are confident enough in your beliefs to where you won’t operate on the assumption that you are wrong about something in order to bet on a more tractable world.
- Noosphere89 28 Sep 2023 16:25 UTC
  2 points
  −1
  Parent
  This is definitely an underrated point. In general, I tend to think that worlds where Eliezer/Nate Soares/Connor Leahy/very doomy people are right are worlds where humans just can’t do much of anything around AI extinction risks, and that the rational response is essentially to do what Elizabeth’s friend did here in the link below, which is to leave AI safety/AI governance and do something else worthwile, as you or anyone else can’t do anything:
  
  https://www.lesswrong.com/posts/tv6KfHitijSyKCr6v/?commentId=Nm7rCq5ZfLuKj5x2G
  
  In general, I think you need certain assumptions in order to justify working either on AI safety or AI governance, and that will set at least a soft ceiling on how doomy you can be. One of those is the assumption of feedback loops are available, which quite obviously rules out a lot of sharp left turns, and in general there’s a limit to how extreme your scenarios for difficulty of safety have to be before you can’t do anything at all, and I think a lot of classic Lesswrong people like Nate Soares and Eliezer Yudkowsky, as well as more modern people like Connor Leahy are way over the line of useful difficulty.