habryka comments on Cosmopolitan values don’t come free

habryka 1 Jun 2023 0:55 UTC
12 points
13
If the result of an optimization process will be predictably horrifying to the agents which are applying that optimization process to themselves, then they will simply not do so.
In other words: AIs which feel anything in the vicinity of kindness before applying cosmic amounts of optimization pressure to themselves will try to steer that optimization pressure towards something which is recognizably kind at the end.
And I don’t think there’s any good argument for why AIs will lack any scrap of kindness with very high confidence at the point where they’re just starting to recursively self-improve.
This feels like it somewhat misunderstands my point. I don’t expect the reflection process I will go through to feel predictably horrifying from the inside. But I do expect the reflection process the AI will go through to feel horrifying to me (because the AI does not share all my metaethical assumptions, and preferences over reflection, and environmental circumstances, and principles by which I trade off values between different parts of me).
This feels like a pretty common experience. Many people in EA seem to quite deeply endorse various things like hedonic utilitarianism, in a way where the reflection process that led them to that opinion feels deeply horrifying to me. Of course it didn’t feel deeply horrifying to them (or at least it didn’t on the dimensions that were relevant to their process of meta-ethical reflection), otherwise they wouldn’t have done it.