Taking over the world is a big enough prize, compared to the wealth of a typical agent, that even a small chance of achieving it should already be enough to act. And waiting is dangerous if there’s a chance of other agents outrunning you. So multiple agents having DSA but not acting for uncertainty reasons seems unlikely.
Logical vs physical risk aversion
Imagine you care about the welfare of two koalas living in separate rooms. Given a choice between both koalas dying with probability 1⁄2 or a randomly chosen koala dying with probability 1, why is the latter preferable?
You could say our situation is different because we’re the koala. Fine. Imagine you’re choosing between a 1⁄2 physical risk and a 1⁄2 logical risk to all humanity, but both of them will happen in 100 years when you’re already dead, so the welfare of your copies isn’t at question. Why is the physical risk preferable? How is that different from the koala situation?
Taking over the world is a big enough prize, compared to the wealth of a typical agent, that even a small chance of achieving it should already be enough to act.
In CAIS, AI services aren’t agents themselves, especially the lower level ones. If they’re controlled by humans, their owners/operators could well be risk verse enough (equivalently, not assign high enough utility to taking over the world) to not take advantage of a DSA given their uncertainty.
Imagine you’re choosing between a 1⁄2 physical risk and a 1⁄2 logical risk to all humanity, but both of them will happen in 100 years when you’re already dead, so the welfare of your copies isn’t at question. Why is the physical risk preferable?
I don’t think it’s possible for the welfare of my copies to not be at question. See this comment.
Another line of argument is that suppose we’ll end up getting most of our utility from escaping simulations and taking over much bigger/richer universes. In those bigger universes we might eventually meet up with copies of us from other Everett branches and have to divide up the universe with them. So physical risk isn’t as concerning in that scenario because the surviving branches will end up with larger shares of the base universes.
A similar line of thought is that in an acausal trade scenario, each surviving branch of a physical risk could get a better deal because whatever thing of value they have to offer has become more scarce in the multiverse economy.
Many such intuitions seem to rely on “doors” between worlds. That makes sense—if we have two rooms of animals connected by a door, then killing all animals in one room will just lead to it getting repopulated from the other room, which is better than killing all animals in both rooms with probability 1⁄2. So in that case there’s indeed a difference between the two kinds of risk.
The question is, how likely is a door between two Everett branches, vs. a door connecting a possible world with an impossible world? With current tech, both are impossible. With sci-fi tech, both could be possible, and based on the same principle (simulating whatever is on the other side of the door). But maybe “quantum doors” are more likely than “logical doors” for some reason?
Another argument that definitely doesn’t rely on any sort of “doors” for why physical risk might be preferable to logical risk is just if you have diminishing returns on the total number of happy humans. As long as your returns to happy humans are sublinear (logarithmic is a standard approximation, though anything sublinear works), then you should prefer a guaranteed shot at 12 the Everett branches having lots of happy humans to a 12 chance of all the Everett branches having happy humans. To see this, suppose U:N→R measures your returns to the total number of happy humans across all Everett branches. Let N be the total number of happy humans in a good Everett branch and M the total number of Everett branches. Then, in the physical risk situation, you get
Uphysical risk=U⎛⎜
⎜⎝M2∑i=1N⎞⎟
⎟⎠=U(MN2)
whereas, in the logical risk situation, you get
Ulogical risk=12U(0)+12U(M∑i=1N)=12U(MN)
which are only equal if U is linear. Personally, I think my returns are sublinear, since I pretty strongly want there to at least be some humans—more strongly than I want there to be more humans, though I want that as well. Furthermore, if you believe there’s a chance that the universe is infinite, then you should probably be using some sort of measure over happy humans rather than just counting the number, and my best guess for what such a measure might look like seems to be at least somewhat locally sublinear.
So you’re saying that (for example) there could be a very large universe that is running simulations of both possible worlds and impossible worlds, and therefore even if we go extinct in all possible worlds, versions of us that live in the impossible worlds could escape into the base universe so the effect of a logical risk would be similar to a physical risk of equal magnitude (if we get most of our utility from controlling/influencing such base universes). Am I understanding you correctly?
If so, I have two objections to this. 1) Some impossible worlds seem impossible to simulate. For example suppose in the actual world AI safety requires solving metaphilosophy. How would you simulate an impossible world in which AI safety doesn’t require solving metaphilosophy? 2) Even for the impossible worlds that maybe can be simulated (e.g., where the trillionth digit of pi is different from what it actually is) it seems that only a subset of reasons for running simulations of possible worlds would apply to impossible worlds, so I’m a lot less sure that “logical doors” exist than I am that “quantum doors” exist.
It seems to me that AI will need to think about impossible worlds anyway—for counterfactuals, logical uncertainty, and logical updatelessness/trade. That includes worlds that are hard to simulate, e.g. “what if I try researching theory X and it turns out to be useless for goal Y?” So “logical doors” aren’t that unlikely.
Taking over the world is a big enough prize, compared to the wealth of a typical agent, that even a small chance of achieving it should already be enough to act. And waiting is dangerous if there’s a chance of other agents outrunning you. So multiple agents having DSA but not acting for uncertainty reasons seems unlikely.
Imagine you care about the welfare of two koalas living in separate rooms. Given a choice between both koalas dying with probability 1⁄2 or a randomly chosen koala dying with probability 1, why is the latter preferable?
You could say our situation is different because we’re the koala. Fine. Imagine you’re choosing between a 1⁄2 physical risk and a 1⁄2 logical risk to all humanity, but both of them will happen in 100 years when you’re already dead, so the welfare of your copies isn’t at question. Why is the physical risk preferable? How is that different from the koala situation?
In CAIS, AI services aren’t agents themselves, especially the lower level ones. If they’re controlled by humans, their owners/operators could well be risk verse enough (equivalently, not assign high enough utility to taking over the world) to not take advantage of a DSA given their uncertainty.
I don’t think it’s possible for the welfare of my copies to not be at question. See this comment.
Another line of argument is that suppose we’ll end up getting most of our utility from escaping simulations and taking over much bigger/richer universes. In those bigger universes we might eventually meet up with copies of us from other Everett branches and have to divide up the universe with them. So physical risk isn’t as concerning in that scenario because the surviving branches will end up with larger shares of the base universes.
A similar line of thought is that in an acausal trade scenario, each surviving branch of a physical risk could get a better deal because whatever thing of value they have to offer has become more scarce in the multiverse economy.
Many such intuitions seem to rely on “doors” between worlds. That makes sense—if we have two rooms of animals connected by a door, then killing all animals in one room will just lead to it getting repopulated from the other room, which is better than killing all animals in both rooms with probability 1⁄2. So in that case there’s indeed a difference between the two kinds of risk.
The question is, how likely is a door between two Everett branches, vs. a door connecting a possible world with an impossible world? With current tech, both are impossible. With sci-fi tech, both could be possible, and based on the same principle (simulating whatever is on the other side of the door). But maybe “quantum doors” are more likely than “logical doors” for some reason?
Another argument that definitely doesn’t rely on any sort of “doors” for why physical risk might be preferable to logical risk is just if you have diminishing returns on the total number of happy humans. As long as your returns to happy humans are sublinear (logarithmic is a standard approximation, though anything sublinear works), then you should prefer a guaranteed shot at 12 the Everett branches having lots of happy humans to a 12 chance of all the Everett branches having happy humans. To see this, suppose U:N→R measures your returns to the total number of happy humans across all Everett branches. Let N be the total number of happy humans in a good Everett branch and M the total number of Everett branches. Then, in the physical risk situation, you get Uphysical risk=U⎛⎜ ⎜⎝M2∑i=1N⎞⎟ ⎟⎠=U(MN2) whereas, in the logical risk situation, you get Ulogical risk=12U(0)+12U(M∑i=1N)=12U(MN) which are only equal if U is linear. Personally, I think my returns are sublinear, since I pretty strongly want there to at least be some humans—more strongly than I want there to be more humans, though I want that as well. Furthermore, if you believe there’s a chance that the universe is infinite, then you should probably be using some sort of measure over happy humans rather than just counting the number, and my best guess for what such a measure might look like seems to be at least somewhat locally sublinear.
So you’re saying that (for example) there could be a very large universe that is running simulations of both possible worlds and impossible worlds, and therefore even if we go extinct in all possible worlds, versions of us that live in the impossible worlds could escape into the base universe so the effect of a logical risk would be similar to a physical risk of equal magnitude (if we get most of our utility from controlling/influencing such base universes). Am I understanding you correctly?
If so, I have two objections to this. 1) Some impossible worlds seem impossible to simulate. For example suppose in the actual world AI safety requires solving metaphilosophy. How would you simulate an impossible world in which AI safety doesn’t require solving metaphilosophy? 2) Even for the impossible worlds that maybe can be simulated (e.g., where the trillionth digit of pi is different from what it actually is) it seems that only a subset of reasons for running simulations of possible worlds would apply to impossible worlds, so I’m a lot less sure that “logical doors” exist than I am that “quantum doors” exist.
It seems to me that AI will need to think about impossible worlds anyway—for counterfactuals, logical uncertainty, and logical updatelessness/trade. That includes worlds that are hard to simulate, e.g. “what if I try researching theory X and it turns out to be useless for goal Y?” So “logical doors” aren’t that unlikely.