Richard_Ngo comments on Cosmopolitan values don’t come free

Richard_Ngo 31 May 2023 23:56 UTC
14 points
9
If the result of an optimization process will be predictably horrifying to the agents which are applying that optimization process to themselves, then they will simply not do so.
In other words: AIs which feel anything in the vicinity of kindness before applying cosmic amounts of optimization pressure to themselves will try to steer that optimization pressure towards something which is recognizably kind at the end.
And I don’t think there’s any good argument for why AIs will lack any scrap of kindness with very high confidence at the point where they’re just starting to recursively self-improve.
Meta: I feel pretty annoyed by the phenomenon of which this current conversation is an instance, because when people keep saying things that I strongly disagree with which will be taken as representing a movement that I’m associated with, the high-integrity (and possibly also strategically optimal) thing to do is to publicly repudiate those claims*, which seems like a bad outcome for everyone. I model it as an epistemic prisoner’s dilemma with the following squares:
D, D: doomers talk a lot about “everyone dies with >90% confidence”, non-doomers publicly repudiate those arguments
C, D: doomers talk a lot about “everyone dies with >90% confidence”, non-doomers let those arguments become the public face of AI alignment despite strongly disagreeing with them
D, C: doomers apply higher epistemic standards on this issue (from the perspective of non-doomers); non-doomers keep applying pressure to doomers to “sanitize” even more aspects of their communication
C,C: doomers apply higher epistemic standards on this issue (from the perspective of non-doomers); non-doomers support doomers making their arguments
I model us as being in the C, D square and I would like to move to the C, C square so I don’t have to spend my time arguing about epistemic standards or repudiating arguments from people who are also trying to prevent AI xrisk. I expect that this is basically the same point that Paul is making when he says “if we can’t get on the same page about our predictions I’m at at least aiming to get folks to stop arguing so confidently for death given takeover”.
I expect that you’re worried about ending up in the D, C square, so in order to mitigate that concern I’m open to making trades on other issues where doomers and non-doomers disagree; I expect you’d know better than I do what trades would be valuable for you here. (One example of me making such a trade in the past was including a week on agent foundations in the AGISF curriculum despite inside-view not thinking it was a good thing to spend time on.) For example, I am open to being louder in other cases where we both agree that someone else is making a bad argument (but which don’t currently meet my threshold for “the high-integrity thing is to make a public statement repudiating that argument”).
* my intuition here is based on the idea that not repudiating those claims is implicitly committing a multi-person motte and bailey (but I can’t find the link to the post which outlines that idea). I expect you (Habyrka) agree with this point in the abstract because of previous cases where you regretted not repudiating things that leading EAs were saying, although I presume that you think this case is disanalogous.
- habryka 1 Jun 2023 0:10 UTC
  24 points
  0
  Parent
  Meta: I feel pretty annoyed by the phenomenon of which this current conversation is an instance, because when people keep saying things that I strongly disagree with which will be taken as representing a movement that I’m associated with, the high-integrity (and possibly also strategically optimal) thing to do is to publicly repudiate those claims*, which seems like a bad outcome for everyone.
  For what it’s worth, I think you should just say that you disagree with it? I don’t really understand why this would be a “bad outcome for everyone”. Just list out the parts you agree on, and list the parts you disagree on. Coalitions should mostly be based on epistemological principles and ethical principles anyways, not object-level conclusions, so at least in my model of the world repudiating my statements if you disagree with them is exactly what I want my allies to do.
  If you on the other hand think the kind of errors you are seeing are evidence about some kind of deeper epistemological problems, or ethical problems, such that you no longer want to be in an actual coalition with the relevant people (or think that the costs of being perceived to be in some trade-coalition with them would outweigh the benefits of actually being in that coalition), I think it makes sense to socially distance yourself from the relevant people, though I think your public statements should mostly just accurately reflect how much you are indeed deferring to individuals, how much trust you are putting into them, how much you are engaging in reputation-trades with them, etc.
  - Richard_Ngo 1 Jun 2023 0:19 UTC
    10 points
    2
    Parent
    When I say “repudiate” I mean a combination of publicly disagreeing + distancing. I presume you agree that this is suboptimal for both of us, and my comment above is an attempt to find a trade that avoids this suboptimal outcome.
    Note that I’m fine to be in coalitions with people when I think their epistemologies have problems, as long as their strategies are not sensitively dependent on those problems. (E.g. presumably some of the signatories of the recent CAIS statement are theists, and I’m fine with that as long as they don’t start making arguments that AI safety is important because of theism.) So my request is that you make your strategies less sensitively dependent on the parts of your epistemology that I have problems with (and I’m open to doing the same the other way around in exchange).
- habryka 1 Jun 2023 0:55 UTC
  11 points
  3
  Parent
  If the result of an optimization process will be predictably horrifying to the agents which are applying that optimization process to themselves, then they will simply not do so.
  In other words: AIs which feel anything in the vicinity of kindness before applying cosmic amounts of optimization pressure to themselves will try to steer that optimization pressure towards something which is recognizably kind at the end.
  And I don’t think there’s any good argument for why AIs will lack any scrap of kindness with very high confidence at the point where they’re just starting to recursively self-improve.
  This feels like it somewhat misunderstands my point. I don’t expect the reflection process I will go through to feel predictably horrifying from the inside. But I do expect the reflection process the AI will go through to feel horrifying to me (because the AI does not share all my metaethical assumptions, and preferences over reflection, and environmental circumstances, and principles by which I trade off values between different parts of me).
  This feels like a pretty common experience. Many people in EA seem to quite deeply endorse various things like hedonic utilitarianism, in a way where the reflection process that led them to that opinion feels deeply horrifying to me. Of course it didn’t feel deeply horrifying to them (or at least it didn’t on the dimensions that were relevant to their process of meta-ethical reflection), otherwise they wouldn’t have done it.