I think I’m using “anthropic” in a way consistent with the end of the first paragraph of Fundamentals of kicking anthropic butt (to refer to situations in which agents get duplicated and/or there is some uncertainty about what agent an agent is). If there’s a more appropriate word then I’d appreciate knowing what it is.
My first objection is already contained in Vladimir_Nesov’s comment: it seems like in general anthropic problems should be phrased entirely as decision problems and not as problems involving the assignment of odds. For example, Sleeping Beauty can be turned into two decision problems: one in which Sleeping Beauty is trying to maximize the expected number of times she is right about the coin flip, and one in which Sleeping Beauty is trying to maximize the probability that she is right about the coin flip. In the first case, Sleeping Beauty’s optimal strategy is to guess tails, whereas in the second case it doesn’t matter what she guesses. In a problem where there’s no anthropic funniness, there’s no difference between trying to maximize the expected number of times you’re right and trying to maximize the probability that you’re right, but with anthropic funniness there is.
My second objection is that I don’t understand how an agent could be convinced of the truth of a sufficiently bizarre premise. (I have the same issue with Pascal’s mugging, torture vs. dust specks, and Newcomb’s problem.) In this particular case, I don’t understand how I could be convinced that another agent really has the capacity to perfectly simulate me. This seems like exactly the kind of thing that agents would be incentivized to lie about in order to trick me.
My second objection is that I don’t understand how an agent could be convinced of the truth of a sufficiently bizarre premise. (I have the same issue with Pascal’s mugging, torture vs. dust specks, and Newcomb’s problem.) In this particular case, I don’t understand how I could be convinced that another agent really has the capacity to perfectly simulate me. This seems like exactly the kind of thing that agents would be incentivized to lie about in order to trick me.
You may eventually obtain the capacity to perfectly simulate yourself, in which case you’ll run into similar issues. I used Omega in a scenario a couple of years ago that’s somewhat similar to the OP’s, but really Omega is just a shortcut for establishing a “clean” scenario that’s relatively free of distractions so we can concentrate on one specific problem at a time. There is a danger of using Omega to construct scenarios that have no real-world relevance, and that’s something that we should keep in mind, but I think it’s not the case in the examples you gave.
How would you characterize your issue with Pascal’s mugging? The dilemma is not supposed to require being convinced of the truth of the proposition, just assigning it a non-zero probability.
Regarding anthropic reasoning, I always understood the term to refer to situations in which you could have been killed/prevented from existing.
it seems like in general anthropic problems should be phrased entirely as decision problems and not as problems involving the assignment of odds
How then, do you assign the odds?
My second objection is that I don’t understand how an agent could be convinced of the truth of a sufficiently bizarre premise. (I have the same issue with Pascal’s mugging, torture vs. dust specks, and Newcomb’s problem.) In this particular case, I don’t understand how I could be convinced that another agent really has the capacity to perfectly simulate me. This seems like exactly the kind of thing that agents would be incentivized to lie about in order to trick me.
You believe Omega because it’s Omega, who always tells the truth and has access to godlike power.
How does any particular agent go about convincing me that it’s Omega?
I don’t know, but Omega does. Probably by demonstrating the ability to do something such that you believe the chance that it could be faked are epsilon^2, where epsilon is your prior belief that a given agent could have godlike powers.
So … you don’t know what the odds are, but you know how to act anyway? I notice that I am confused.
it’s Omega, who always tells the truth and has access to godlike power.
How does any particular agent go about convincing me that it’s Omega?
Assuming that there is only one godlike agent known or predicted in the environment, that Omega is a known feature of the environment, and that you have no reason to believe that e.g. you are hallucinating, then presumably all Omega needs to do is demonstrate his godlike powers—by predicting your every action ahead of time, say, or turning the sky green with purple spots.
So … you don’t know what the odds are, but you know how to act anyway? I notice that I am confused.
In an anthropic situation, I don’t think it makes sense to assign odds to statements like “I will see X” because the meaning of the term “I” becomes unclear. (For example, I don’t think it makes sense for Sleeping Beauty to assign odds to statements like “I will see heads.”) I can still assign odds to statements like “exactly fifteen copies of me will see X” by reasoning about what I currently expect my copies to see, given what I know about how I’ll be copied, and using those odds I can still make decisions.
Assuming that there is only one godlike agent known or predicted in the environment, that Omega is a known feature of the environment, and that you have no reason to believe that e.g. you are hallucinating, then presumably all Omega needs to do is demonstrate his godlike powers—by predicting your every action ahead of time, say, or turning the sky green with purple spots.
Omega needs to both always tell the truth and have access to godlike power. How does Omega prove to me that it always tells the truth?
I don’t think it makes sense to assign odds to statements like “I will see X” because the meaning of the term “I” becomes unclear. (For example, I don’t think it makes sense for Sleeping Beauty to assign odds to statements like “I will see heads.”)
I don’t understand this, TBH, but whatever.
What do you think Alice should choose?
Omega needs to both always tell the truth and have access to godlike power. How does Omega prove to me that it always tells the truth?
It is a known feature of the environment that people are regularly ambushed by an agent, calling itself Omega, which has never yet been known to lie and has access to godlike power.
There’s nothing to choose until you’ve specified a decision problem.
It is a known feature of the environment that people are regularly ambushed by an agent, calling itself Omega, which has never yet been known to lie and has access to godlike power.
An agent with godlike power can manufacture evidence that it has any other traits it wants, so observing such evidence isn’t actually evidence that it has those traits.
There’s nothing to choose until you’ve specified a decision problem.
Um, I have.
Imagine, for simplicity, a purely selfish agent. Call it Alice. Alice is an expected utility maximizer, and she gains utility from eating cakes. Omega appears and offers her a deal—they will flip a fair coin, and give Alice three cakes if it comes up heads. If it comes up tails, they will take one cake away her stockpile. Alice runs the numbers, determines that the expected utility is positive, and accepts the deal. Just another day in the life of a perfectly truthful superintelligence offering inexplicable choices.
The next day, Omega returns. This time, they offer a slightly different deal—instead of flipping a coin, they will perfectly simulate Alice once. This copy will live out her life just as she would have done in reality—except that she will be given three cakes. The original Alice, however, receives nothing.
What do you think Alice should choose?
An agent with godlike power can manufacture evidence that it has any other traits it wants, so observing such evidence isn’t actually evidence that it has those traits.
An agent with godlike power can manufacture evidence of anything. This seems suspiciously like a Fully General Counterargument. Such an agent could, in any case, directly hack your brain so you don’t realize it can falsify evidence.
Oh, you were referring to the original decision problem. I was referring to the part of the post I was originally responding to about what subjective odds Alice should assign things. I just ran across an LW post making an important comment on these types of problems, which is that the answer depends on how altruistic Alice feels towards copies of herself. If she feels perfectly altruistic towards copies of herself, then sure, take the cakes.
An agent with godlike power can manufacture evidence of anything. This seems suspiciously like a Fully General Counterargument. Such an agent could, in any case, directly hack your brain so you don’t realize it can falsify evidence.
Yes, it’s a fully general counterargument against believing anything that an agent with godlike power says. Would this sound more reasonable if “godlike” were replaced with “devil-like”?
I just ran across an LW post making an important comment on these types of problems, which is that the answer depends on how altruistic Alice feels towards copies of herself. If she feels perfectly altruistic towards copies of herself, then sure, take the cakes.
That’s … a very good point, actually.
I assume, on this basis, that you agree with the hypothetical at the end?
Would this sound more reasonable if “godlike” were replaced with “devil-like”?
Yes. Devil-like implies known hostility.
By this logic, the moment you turn on a provably Friendly AI you should destroy it, because it might have hacked you into thinking it’s friendly. Worse still, a hostile god would presumably realize you wont believe it, and so hack you into thinking it’s not godlike; so anything claiming not to be godlike is lying.
Bottom line: gods are evidence that the world may be a lie. But not strong evidence.
I assume, on this basis, that you agree with the hypothetical at the end?
Nope. I still don’t agree that simulating an agent N times and doing X to one of them is morally equivalent to risking X to them with probability 1/(N+1). For example, if you are not at all altruistic to copies of yourself, then you don’t care about the former situation as long as the copy that X is being done to is not you. On the other hand, if you value fairness among your copies (that is, if you value your copies having similar quality of life) then you care about the former situation more strongly than the latter situation.
By this logic, the moment you turn on a provably Friendly AI you should destroy it, because it might have hacked you into thinking it’s friendly. Worse still, a hostile god would presumably realize you wont believe it, and so hack you into thinking it’s not godlike; so anything claiming not to be godlike is lying.
Pretty much the only thing a godlike agent can convince me of is that it’s godlike (and I am not even totally convinced this is possible). After that, again, whatever evidence a godlike agent presents of anything else could have been fabricated. Your last inference doesn’t follow from the others; my priors regarding the prevalence of godlike agents is currently extremely low, and claiming not to be godlike is not strong evidence either way.
To be clear, since humans are specified as valuing all agents (including sims of themselves and others) shouldn’t it be equivalent to Alice-who-values-copies-of-herself?
my priors regarding the prevalence of godlike agents is currently extremely low,
And what are those priors based on? Evidence! Evidence that a godlike being would be motivated to falsify!
To be clear, since humans are specified as valuing all agents (including sims of themselves and others) shouldn’t it be equivalent to Alice-who-values-copies-of-herself?
Sure, but the result you describe is equivalent to Alice being an average utilitarian with respect to copies of herself. What if Alice is a total utilitarian with respect to copies of herself?
You keep using that word. I don’t think it means what you think it means.
Seriously, though, what do you think the flaw in the argument is, as presented in your quote?
I think I’m using “anthropic” in a way consistent with the end of the first paragraph of Fundamentals of kicking anthropic butt (to refer to situations in which agents get duplicated and/or there is some uncertainty about what agent an agent is). If there’s a more appropriate word then I’d appreciate knowing what it is.
My first objection is already contained in Vladimir_Nesov’s comment: it seems like in general anthropic problems should be phrased entirely as decision problems and not as problems involving the assignment of odds. For example, Sleeping Beauty can be turned into two decision problems: one in which Sleeping Beauty is trying to maximize the expected number of times she is right about the coin flip, and one in which Sleeping Beauty is trying to maximize the probability that she is right about the coin flip. In the first case, Sleeping Beauty’s optimal strategy is to guess tails, whereas in the second case it doesn’t matter what she guesses. In a problem where there’s no anthropic funniness, there’s no difference between trying to maximize the expected number of times you’re right and trying to maximize the probability that you’re right, but with anthropic funniness there is.
My second objection is that I don’t understand how an agent could be convinced of the truth of a sufficiently bizarre premise. (I have the same issue with Pascal’s mugging, torture vs. dust specks, and Newcomb’s problem.) In this particular case, I don’t understand how I could be convinced that another agent really has the capacity to perfectly simulate me. This seems like exactly the kind of thing that agents would be incentivized to lie about in order to trick me.
You may eventually obtain the capacity to perfectly simulate yourself, in which case you’ll run into similar issues. I used Omega in a scenario a couple of years ago that’s somewhat similar to the OP’s, but really Omega is just a shortcut for establishing a “clean” scenario that’s relatively free of distractions so we can concentrate on one specific problem at a time. There is a danger of using Omega to construct scenarios that have no real-world relevance, and that’s something that we should keep in mind, but I think it’s not the case in the examples you gave.
How would you characterize your issue with Pascal’s mugging? The dilemma is not supposed to require being convinced of the truth of the proposition, just assigning it a non-zero probability.
Hmm. You’re right. Upon reflection, I don’t have a coherent rejection of Pascal’s mugging yet.
Gotcha. Your posts have seemed pretty thoughtful so far so I was surprised by / curious about that comment. :)
Regarding anthropic reasoning, I always understood the term to refer to situations in which you could have been killed/prevented from existing.
How then, do you assign the odds?
You believe Omega because it’s Omega, who always tells the truth and has access to godlike power.
You don’t.
How does any particular agent go about convincing me that it’s Omega?
I don’t know, but Omega does. Probably by demonstrating the ability to do something such that you believe the chance that it could be faked are epsilon^2, where epsilon is your prior belief that a given agent could have godlike powers.
So … you don’t know what the odds are, but you know how to act anyway? I notice that I am confused.
Assuming that there is only one godlike agent known or predicted in the environment, that Omega is a known feature of the environment, and that you have no reason to believe that e.g. you are hallucinating, then presumably all Omega needs to do is demonstrate his godlike powers—by predicting your every action ahead of time, say, or turning the sky green with purple spots.
In an anthropic situation, I don’t think it makes sense to assign odds to statements like “I will see X” because the meaning of the term “I” becomes unclear. (For example, I don’t think it makes sense for Sleeping Beauty to assign odds to statements like “I will see heads.”) I can still assign odds to statements like “exactly fifteen copies of me will see X” by reasoning about what I currently expect my copies to see, given what I know about how I’ll be copied, and using those odds I can still make decisions.
Omega needs to both always tell the truth and have access to godlike power. How does Omega prove to me that it always tells the truth?
I don’t understand this, TBH, but whatever.
What do you think Alice should choose?
It is a known feature of the environment that people are regularly ambushed by an agent, calling itself Omega, which has never yet been known to lie and has access to godlike power.
There’s nothing to choose until you’ve specified a decision problem.
An agent with godlike power can manufacture evidence that it has any other traits it wants, so observing such evidence isn’t actually evidence that it has those traits.
Um, I have.
What do you think Alice should choose?
An agent with godlike power can manufacture evidence of anything. This seems suspiciously like a Fully General Counterargument. Such an agent could, in any case, directly hack your brain so you don’t realize it can falsify evidence.
Oh, you were referring to the original decision problem. I was referring to the part of the post I was originally responding to about what subjective odds Alice should assign things. I just ran across an LW post making an important comment on these types of problems, which is that the answer depends on how altruistic Alice feels towards copies of herself. If she feels perfectly altruistic towards copies of herself, then sure, take the cakes.
Yes, it’s a fully general counterargument against believing anything that an agent with godlike power says. Would this sound more reasonable if “godlike” were replaced with “devil-like”?
That’s … a very good point, actually.
I assume, on this basis, that you agree with the hypothetical at the end?
Yes. Devil-like implies known hostility.
By this logic, the moment you turn on a provably Friendly AI you should destroy it, because it might have hacked you into thinking it’s friendly. Worse still, a hostile god would presumably realize you wont believe it, and so hack you into thinking it’s not godlike; so anything claiming not to be godlike is lying.
Bottom line: gods are evidence that the world may be a lie. But not strong evidence.
Nope. I still don’t agree that simulating an agent N times and doing X to one of them is morally equivalent to risking X to them with probability 1/(N+1). For example, if you are not at all altruistic to copies of yourself, then you don’t care about the former situation as long as the copy that X is being done to is not you. On the other hand, if you value fairness among your copies (that is, if you value your copies having similar quality of life) then you care about the former situation more strongly than the latter situation.
Pretty much the only thing a godlike agent can convince me of is that it’s godlike (and I am not even totally convinced this is possible). After that, again, whatever evidence a godlike agent presents of anything else could have been fabricated. Your last inference doesn’t follow from the others; my priors regarding the prevalence of godlike agents is currently extremely low, and claiming not to be godlike is not strong evidence either way.
To be clear, since humans are specified as valuing all agents (including sims of themselves and others) shouldn’t it be equivalent to Alice-who-values-copies-of-herself?
And what are those priors based on? Evidence! Evidence that a godlike being would be motivated to falsify!
Sure, but the result you describe is equivalent to Alice being an average utilitarian with respect to copies of herself. What if Alice is a total utilitarian with respect to copies of herself?
Actually, she should still make the same choice, although she would choose differently in other scenarios.