(E. g., religious zealots with a preference for there being a bunch of people burning in hell. One might naively guess that their ideal worlds are those in which Hell-deserving people don’t exist to begin with… But I don’t feel sure about this at all.)
I think CEV interfacing with wrong beliefs is a bit tricky. My strong guess is that the vast majority of people whose minds are intact will not end up preferring to believe almost any wrong things that have major implications for what to do with the future. It’s just really extremely adaptive to have true beliefs instead of false ones, and people know that.[1]
I do think you can have internalized something so deeply that you reorient your whole epistemology around it, which I cover a bit in the section on broken minds. I agree that religious zealotry around stuff like people in hell is scary, but I do think it requires a very strong commitment to the bit to keep believing that even if you know everything there is to know about the world (which I don’t think would apply to almost anyone in a position of power)
Worlds optimal by their values would plausibly contain demographics optimized for this purpose. Their suffering would not be accidental, it would the whole point.
I think my opinion here is at the intersection of my section on torturing Bob, and my section on optimized cows. My best guess is that you do have some chance of having some people you terrorize around, but I think that preference is less likely to turn out scope-sensitive. And then, even if you do want people to terrorize, my guess is the ideal mind on which to project your terror, is probably also not the kind of mind that I am actually that worried about will suffer.
I don’t think either of these are enormously strong, but they are strong enough to create like 90%+ confidence for Putin in-particular (I agree there is lots of wiggle-room left in how common these traits are overall).
But also, IDK, my sense is people really really like to imagine their enemies as more fundamentally evil than they are, and I doubt that evil and power correlate to the level where this would remotely explain why every nation will always villainize the leaders of its enemies. My best guess is Putin is like ~90th percentile evil, if one was to try to construct a linear scale here. Most of the bits of selection need to go into competence, not evilness. And so when attributing preferences like “he will reshape the universe to be filled with people he can terrorize”, I feel like “this is someone running the evil-sounding sentence generator” is much more likely than “this is an actually legitimate preference I expect him to have”.
This seems to be recurring crux for people in a few different contexts, so if people actually have uncertainty on this, I wonder whether there is some kind of simple survey we could run. If there was a reflectively stable preference for people to believe lots of wrong things, that would certainly make my case harder.
I agree that religious zealotry around stuff like people in hell is scary, but I do think it requires a very strong commitment to the bit to keep believing that even if you know everything there is to know about the world
Mm, you seem to be assuming that the value system of our hypothetical religious zealot is clearly structured such that they want people in hell because God wants that (or whatever) – meaning that if they learned that there’s no God, those downstream preferences would also dissolve. I don’t know that this is the case. It seems plausible to me that the preference for Hell will have ended up terminal, a picture of what a “just” world looks like. If so, it would survive God’s own dissolution.
More generally… Hm, religious zealotry does feel like a particularly bad case. But I think it extends to non-religious ideology-poisoned people as well. Like, if someone really really hates some demographic X, it may be the case that their ideal world doesn’t have that demographic at all… Or it may feel more natural and “correct” for them that this demographic exists, but in pain.
It seems to me that deep cruelty, and the desire to regularly exercise it, is part of the default human-values package. If yes, CEVing random people[1] is going to instantiate worlds in which it’s very present. I may be overly cynical on that, I’m not sure.
My best guess is that you do have some chance of having some people you terrorize around, but I think that preference is less likely to turn out scope-sensitive
If I understand correctly, your mental picture here is that the dictator would allow a vast cosmos-sprawling civilization to exist, and that most of it will be relatively free to flourish, except the relatively small bubble around the dictator, in which they would exercise their preference for terror?
But would a very selfish person’s preference for other people existing be scope-sensitive? I think plausibly not: a selfish dictator wouldn’t necessarily care to create that cosmos-sprawling civilization. Instead, they may just mothball most of the resources in reach so they can prolong their own life, and only maintain a small bubble of activity in their immediate vicinity.
In that case, the scope of terror would be approximately the same as the scope of flourishing. Possibly to net negative eudaimonia? Or, at least, to astronomically less eudaimonia than possible, such that this vs. omnicide is not a no-brainer.
But also, IDK, my sense is people really really like to imagine their enemies as more fundamentally evil than they are
To be clear, I don’t think Putin is the epitome of evil. I don’t even know that he’s 90th-percentile evil, if we define evil as “an active preference for there to be suffering”. Rather, I think such people end up selected for callousness first and foremost. And then past that, it seems pretty easy for their CEV to end up something like the above “small bubble of activity in which everyone lives and dies at their whims”.
Edit: Note that we’ve ended up discussing two very different scenarios: net-negative CEVs where mass suffering is present because of ideological reasons for how the world ought to look like, and CEVs that are net-negative because of very scope-insensitive/selfish preferences that aren’t clearly dominated by positive values.
It seems to me that deep cruelty, and the desire to regularly exercise it, is part of the default human-values package. If yes, CEVing random people is going to instantiate worlds in which it’s very present.
This doesn’t follow. Something could be part of the default human-values package, but also reliably discarded under reflection.
Right, I guess I meant something like “cruelty is clearly part of the default human-values package, and it seems to me that many people may reflectively endorse it under reflection”.
But would a very selfish person’s preference for other people existing be scope-sensitive? I think plausibly not: a selfish dictator wouldn’t necessarily care to create that cosmos-sprawling civilization. Instead, they may just mothball most of the resources in reach so they can prolong their own life, and only maintain a small bubble of activity in their immediate vicinity.
The vast majority of people seem to want children, and for their children to have children, and that alone would fill the cosmos in due time. This is definitely a preference on which some people can differ, but it still seems pretty close to universal.
(Responded to that one because it’s easy, will respond to the rest after I have slept)
I’m more pessimistic than you, even. Someone going through the CEV process, and having an ASI optimize the universe to that CEV, is undergoing apotheosis, becoming a god. So I don’t think their beliefs need to survive the dissolution of gods, they just need to survive the realization that they are a god. If their prior beliefs require that god has been crucified and resurrected, for example, they can have that experience. If their prior belief is that a god is okay with people going to hell, and their current belief is that they are a god and they are okay with people going to hell, there is no conflict that requires a reassessment of the morality of people going to hell.
Sure, maybe it all works out for the best, but I would rather gamble on the CEV of a virtue-aligned entity, if I had to gamble the universe.
I think CEV interfacing with wrong beliefs is a bit tricky. My strong guess is that the vast majority of people whose minds are intact will not end up preferring to believe almost any wrong things that have major implications for what to do with the future. It’s just really extremely adaptive to have true beliefs instead of false ones, and people know that.[1]
I do think you can have internalized something so deeply that you reorient your whole epistemology around it, which I cover a bit in the section on broken minds. I agree that religious zealotry around stuff like people in hell is scary, but I do think it requires a very strong commitment to the bit to keep believing that even if you know everything there is to know about the world (which I don’t think would apply to almost anyone in a position of power)
I think my opinion here is at the intersection of my section on torturing Bob, and my section on optimized cows. My best guess is that you do have some chance of having some people you terrorize around, but I think that preference is less likely to turn out scope-sensitive. And then, even if you do want people to terrorize, my guess is the ideal mind on which to project your terror, is probably also not the kind of mind that I am actually that worried about will suffer.
I don’t think either of these are enormously strong, but they are strong enough to create like 90%+ confidence for Putin in-particular (I agree there is lots of wiggle-room left in how common these traits are overall).
But also, IDK, my sense is people really really like to imagine their enemies as more fundamentally evil than they are, and I doubt that evil and power correlate to the level where this would remotely explain why every nation will always villainize the leaders of its enemies. My best guess is Putin is like ~90th percentile evil, if one was to try to construct a linear scale here. Most of the bits of selection need to go into competence, not evilness. And so when attributing preferences like “he will reshape the universe to be filled with people he can terrorize”, I feel like “this is someone running the evil-sounding sentence generator” is much more likely than “this is an actually legitimate preference I expect him to have”.
This seems to be recurring crux for people in a few different contexts, so if people actually have uncertainty on this, I wonder whether there is some kind of simple survey we could run. If there was a reflectively stable preference for people to believe lots of wrong things, that would certainly make my case harder.
Mm, you seem to be assuming that the value system of our hypothetical religious zealot is clearly structured such that they want people in hell because God wants that (or whatever) – meaning that if they learned that there’s no God, those downstream preferences would also dissolve. I don’t know that this is the case. It seems plausible to me that the preference for Hell will have ended up terminal, a picture of what a “just” world looks like. If so, it would survive God’s own dissolution.
More generally… Hm, religious zealotry does feel like a particularly bad case. But I think it extends to non-religious ideology-poisoned people as well. Like, if someone really really hates some demographic X, it may be the case that their ideal world doesn’t have that demographic at all… Or it may feel more natural and “correct” for them that this demographic exists, but in pain.
It seems to me that deep cruelty, and the desire to regularly exercise it, is part of the default human-values package. If yes, CEVing random people[1] is going to instantiate worlds in which it’s very present. I may be overly cynical on that, I’m not sure.
If I understand correctly, your mental picture here is that the dictator would allow a vast cosmos-sprawling civilization to exist, and that most of it will be relatively free to flourish, except the relatively small bubble around the dictator, in which they would exercise their preference for terror?
But would a very selfish person’s preference for other people existing be scope-sensitive? I think plausibly not: a selfish dictator wouldn’t necessarily care to create that cosmos-sprawling civilization. Instead, they may just mothball most of the resources in reach so they can prolong their own life, and only maintain a small bubble of activity in their immediate vicinity.
In that case, the scope of terror would be approximately the same as the scope of flourishing. Possibly to net negative eudaimonia? Or, at least, to astronomically less eudaimonia than possible, such that this vs. omnicide is not a no-brainer.
To be clear, I don’t think Putin is the epitome of evil. I don’t even know that he’s 90th-percentile evil, if we define evil as “an active preference for there to be suffering”. Rather, I think such people end up selected for callousness first and foremost. And then past that, it seems pretty easy for their CEV to end up something like the above “small bubble of activity in which everyone lives and dies at their whims”.
Edit: Note that we’ve ended up discussing two very different scenarios: net-negative CEVs where mass suffering is present because of ideological reasons for how the world ought to look like, and CEVs that are net-negative because of very scope-insensitive/selfish preferences that aren’t clearly dominated by positive values.
As opposed to CEVing humanity-as-a-whole, which IIRC is the original CEV target.
This doesn’t follow. Something could be part of the default human-values package, but also reliably discarded under reflection.
Right, I guess I meant something like “cruelty is clearly part of the default human-values package, and it seems to me that many people may reflectively endorse it under reflection”.
The vast majority of people seem to want children, and for their children to have children, and that alone would fill the cosmos in due time. This is definitely a preference on which some people can differ, but it still seems pretty close to universal.
(Responded to that one because it’s easy, will respond to the rest after I have slept)
I’m more pessimistic than you, even. Someone going through the CEV process, and having an ASI optimize the universe to that CEV, is undergoing apotheosis, becoming a god. So I don’t think their beliefs need to survive the dissolution of gods, they just need to survive the realization that they are a god. If their prior beliefs require that god has been crucified and resurrected, for example, they can have that experience. If their prior belief is that a god is okay with people going to hell, and their current belief is that they are a god and they are okay with people going to hell, there is no conflict that requires a reassessment of the morality of people going to hell.
Sure, maybe it all works out for the best, but I would rather gamble on the CEV of a virtue-aligned entity, if I had to gamble the universe.