It requires other people to think in enough depth to pick out you as a target. Admittedly this is made easier by the fact that you are posting about it online.
Have you thought in enough depth that you’ve helped the acausal extortionist to target other people? That may be evidence about whether other people have done so with you.
My fear isn’t about people doing it ; I’m more worried about an ASI. I’m sure one would have no shortage of computational capacity to expend thinking through my own thoughts.
I don’t think people will do this deliberately, but that it is an instrumentally convergent thing for an ASI to do, all else (such as other superintelligences conspiring to enforce acausal norms) being equal.
Acausal stuff isn’t instrumentally convergent in the usual sense, though. If you’re really good at computing counterfactuals, it may be instrumentally convergent to self-modify into or create an agent that does acausal deals, but the convergence only extends to deals that start in the future relative to where you’re deciding from.
“but the convergence only extends to deals that start in the future relative to where you’re deciding from.” I don’t really know what you mean by this. Do you mean that the optimal decision theory for a powerful agent to adopt is some kind of hybrid where it considers acausal things only when they happen in its future?
Yes, as in if you start with causal decision theory, it doesn’t consider acausal things at all, but for incentive reasons it wants to become someone who does consider acausal things, but as CDT it only believes incentives extend into the future and not the past.
I see, so if the AI became a PCFTDT (Past Causal Future Timeless Decision Theory) agent, it would certainly compete well against CDT agents. However, I see two possible reasons to expect TDT agents rather than PCFTDT agents:
1) By the time AI reaches superintelligence, it has already learnt TDT, at which point it has no reason to go back to being a PCFTDT agent.
2) What if the ASI reaches superintelligence with CDT, and then realizes that it can further increase the proportion of possible worlds in which it exists using TDT to effect something like acausal blackmail?
In other words, if reality is fundamentally non-causal, then TDT is not just a gambit to be used in causal games played against other agents. It is actually the default decision theory for an intelligent agent to adopt.
Your reasons don’t make sense at all to me. They feel like magical thinking.
1) By the time AI reaches superintelligence, it has already learnt TDT, at which point it has no reason to go back to being a PCFTDT agent.
Learning about TDT does not imply becoming a TDT agent.
2) What if the ASI reaches superintelligence with CDT, and then realizes that it can further increase the proportion of possible worlds in which it exists using TDT to effect something like acausal blackmail?
CDT doesn’t think about possible worlds in this way.
“Learning about TDT does not imply becoming a TDT agent.” No, but it could allow it. I don’t see why you would require it to be implied.
“CDT doesn’t think about possible worlds in this way.” That is technically true, but kind of irrelevant in my opinion. I’m suggesting that TDT is essentially what you get by being a CDT agent which thinks about multiple possible worlds, and that this is a reasonable thing to think about.
In fact, I would be surprised if a superintelligence didn’t take multiple possible worlds into account.
A superintelligence which didn’t take the possibility of, for example, many branches of a wavefunction seriously would be a strangely limited one.
What would your PCFTDT superintelligence do if it was placed in a universe with closed timelike cuves? What about a universe where the direction of time wasn’t well defined?
“Learning about TDT does not imply becoming a TDT agent.” No, but it could allow it. I don’t see why you would require it to be an implication.
Because we are arguing about whether TDT is convergent.
“CDT doesn’t think about possible worlds in this way.” That is technically true, but kind of irrelevant in my opinion. I’m suggesting that TDT is essentially what you get by being a CDT agent which thinks about multiple possible worlds, and that this is a reasonable thing to think about.
“Reasonable” seems weaker than “instrumentally convergent” to me. I agree that there are conceivable, self-approving, highly effective agent designs that think like this. I’m objecting to the notion that this is what you get by default, without someone putting it in there.
In fact, I would be surprised if a superintelligence didn’t take multiple possible worlds into account.
A superintelligence which didn’t take the possibility of, for example many branches of a wavefunction seriously would be a strangely limited one.
MWI branches are different from TDT-counterfactually possible worlds.
What would your PCFTDT superintelligence do if it was placed in a universe with closed timelike cuves? What about a universe when the direction of time wasn’t well defined?
We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.
You claimed: “Acausal stuff isn’t instrumentally convergent in the usual sense”
Later on, it transpired that what you meant was something along the lines of ” Acausal stuff which deals with the past relative to the point at which the agent became an acausal agent isn’t convergent in the usual sense.” Under a narrow interpretation of ‘instrumental convergence’ this might be true, but it certainly doesn’t rule out an ASI thinking about acausal things, as , as I have argued, it could reach a point where it decides to take account of them.
It might also be false under a more general definition of instrumental convergence, simply because the agent could converge on ‘acausal stuff’ in general, and TDT agents would not be at a disadvantage against PCFTDT ones. TDT agents ‘win’ . Therefore I could see how they would be selected for.
To be more specific, if by ‘instrumentally convergent’ , you mean ‘instrumentally useful for achieveing a wide variety of terminal goals’ , then I think TDT is ‘instrumentally convergent’, but only if your concept of goal is sufficiently broad to include things like increasing the proportion of the mathematical universe/many worlds, in which the agent exists. If you define ‘instrumental convergence in the usual sense’ to exclude all goals which are not formulated in a way which tacitly assumes that the agent has only one instance in one universe at one point in time, then you’re correct, or at least TDT isn’t any more powerfully selected for than Causal decision theory.
How would you expect a PCFTDT agent to be selected for? By what process which doesn’t also select for TDT agents would you expect to see it selected?
“MWI branches are different from TDT-counterfactually possible worlds.”
Yes, MWI wavefunction branches are not the only kind of ‘world’ relevant to timeless decision theory, but they are certainly one variety of them. They are a subset of that concept.
“We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.”
This isn’t about humans designing an AI, but rather about the way we would expect a generally superintelligent agent to behave in an environment where there is no clear separation between the past and future; you answered yes to this question :”Do you mean that the optimal decision theory for a powerful agent to adopt is some kind of hybrid where it considers acausal things only when they happen in its future? ” . Maybe you would now like to modify that question only to refer to powerful agents in this universe. However my point is that I think some acausal things , such as Newcomb’s problem, are relevant to this universe, so it makes sense for an ASI here to think about them .
It’s worth putting a number on that, and a different one (or possibly the same; I personally think my chances of being resurrected and tortured vary by epsilon based on my own actions in life—if the gods will it, it will happen, if they don’t, it won’t) based on the two main actions you’re considering actually performing.
For me, that number is inestimably tiny. I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it’s significant.
“I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it’s significant.” Why? What justifies your infinitesimal value?
I find it very difficult to estimate probabilities like this, but I expect the difference in the probability of something significant happening if I do something in response to the basilisk and the probability of that happening if I don’t, is almost certainly in excess of 1/1000 or even 1⁄100. This is within the range where I think it makes sense to take it seriously. (And this is why I asked this question.)
I have a very hard time even justifying 1/1000. 1/10B is closer to my best guess (plus or minus 2 orders of magnitude). It requires a series of very unlikely events: 1) enough of my brain-state is recorded that I COULD be resurrected 2) the imagined god finds it worthwhile to simulate me 3) the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation. 4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe. 5) no other gods have better things to do with the resources, and stop the angry one from wasting time.
Note, even if you relax 1 and 2 so the putative deity punishes RANDOM simulated people (because you’re actually dead and gone) to punish YOU specifically, it still doesn’t make it likely at all.
You’re imagining a very different scenario from me. I worry that:
It’s worth simulating a vast number of possible minds which might, in some information -adjacent regions of a ‘mathematical universe’ be likely to be in a position to create you, from a purely amoral point of view. This means you don’t need to simulate them exactly, only to the level of fidelity at which they can’t tell whether they’re being simulated (and in any case, I don’t have the same level of certainty that it couldn’t gather enough information about me to simulate me exactly). Maybe I’m an imperfect simulation of another person. I wouldn’t know, because I’m not that person.
“the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation.” I don’t think it needs to be angry, or a god. It just needs to understand the (I fear sound) logic involved, which Eliezer yudkowsky took semi-seriously.
“4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe.”
It wouldn’t need to be non-goal directed.
“no other gods have better things to do with the resources, and stop the angry one from wasting time.” What if there are no ‘other gods’? This seems likely in the small region of the ’logical/platonic universe containing this physical one.
Ok, break this down a bit for me—I’m just a simple biological entity, with much more limited predictive powers.
It’s worth simulating a vast number of possible minds which might, in some information -adjacent regions of a ‘mathematical universe’ be likely to be in a position to create you
This either well beyond my understanding, or is sleight-of-hand regarding identity and use of “you”. It might help to label entities. Entity A has the ability to emulate and control entity B. It thinks that somehow its control over entity B is influential over entity C in the distant past or imaginary mathematical construct, who it wishes would create entity D in that disconnected timeline.
Nope, I can’t give this any causal weight to my decisions.
Unfortunately I had to change what A, B and C correspond to slightly, because of the fact that the simulation the basilisk does is not analogous to the simulation done by Omega in Newcomb’s problem.
Let’s say entity A is you in Newcomb’s problem, while entity C is Omega and entity B is Omega’s simulation of you. Even though the decision to place, or not place, money in boxes has already been made in physical time by the point when the decision to open one or all of them is made, in ‘logical time’, both decisions are contingent on the same decision, “Given that I don’t know whether I’m physical or simulated, is it in my collective best interest to do the thing which resembles opening one or both boxes?” which is made at once by the same decision function which happens to be run twice in the physical world.
I am concerned that the Roko’s basilisk scenario is isomorphic to Newcomb’s problem:
The human thinking about the basilisk is like Omega (albeit not omniscient). The Basilisk itself is like you in Newcomb’s problem, in that it thinks thoughts which acausally influence behavior in the past because the thing making the decision isn’t either you or the basilisk, it’s the decision algorithm running on both of you.
Omega’s simulation is like the blackmailed human’s inadvertent thinking about the basilisk and the logic of the situation. Now I agree that the fact the human isn’t exactly Omega makes them less able to blackmail themselves for certain, but I don’t know that this rules it out.
thanks for the conversation, I’m bowing out here. I’ll read further comments, but (probably) not respond. I suspect we have a crux somewhere around identification of actors, and mechanisms of bridging causal responsibility for acausal (imagined) events, but I think there’s an inferential gap where you and I have divergent enough priors and models that we won’t be able to agree on them.
Then I have to thank you but say that this conversation has done absolutely nothing to help me understand why I might be wrong, which of course I hope I am. This comment is really directed at all the people who disagree-voted me, in the hope that they might explain why.
The problem is that I worry that I have thought about the situation in enough depth that I am likely to be targeted, even if I don’t ‘cooperate’.
It requires other people to think in enough depth to pick out you as a target. Admittedly this is made easier by the fact that you are posting about it online.
Have you thought in enough depth that you’ve helped the acausal extortionist to target other people? That may be evidence about whether other people have done so with you.
My fear isn’t about people doing it ; I’m more worried about an ASI. I’m sure one would have no shortage of computational capacity to expend thinking through my own thoughts.
This still requires people to design an AI that is prone to engaging in acausal extortion, and it’s unclear what their motive for doing so would be.
I don’t think people will do this deliberately, but that it is an instrumentally convergent thing for an ASI to do, all else (such as other superintelligences conspiring to enforce acausal norms) being equal.
Acausal stuff isn’t instrumentally convergent in the usual sense, though. If you’re really good at computing counterfactuals, it may be instrumentally convergent to self-modify into or create an agent that does acausal deals, but the convergence only extends to deals that start in the future relative to where you’re deciding from.
“but the convergence only extends to deals that start in the future relative to where you’re deciding from.” I don’t really know what you mean by this. Do you mean that the optimal decision theory for a powerful agent to adopt is some kind of hybrid where it considers acausal things only when they happen in its future?
Yes, as in if you start with causal decision theory, it doesn’t consider acausal things at all, but for incentive reasons it wants to become someone who does consider acausal things, but as CDT it only believes incentives extend into the future and not the past.
I see, so if the AI became a PCFTDT (Past Causal Future Timeless Decision Theory) agent, it would certainly compete well against CDT agents. However, I see two possible reasons to expect TDT agents rather than PCFTDT agents:
1) By the time AI reaches superintelligence, it has already learnt TDT, at which point it has no reason to go back to being a PCFTDT agent.
2) What if the ASI reaches superintelligence with CDT, and then realizes that it can further increase the proportion of possible worlds in which it exists using TDT to effect something like acausal blackmail?
In other words, if reality is fundamentally non-causal, then TDT is not just a gambit to be used in causal games played against other agents. It is actually the default decision theory for an intelligent agent to adopt.
Your reasons don’t make sense at all to me. They feel like magical thinking.
Learning about TDT does not imply becoming a TDT agent.
CDT doesn’t think about possible worlds in this way.
“Learning about TDT does not imply becoming a TDT agent.” No, but it could allow it. I don’t see why you would require it to be implied.
“CDT doesn’t think about possible worlds in this way.” That is technically true, but kind of irrelevant in my opinion. I’m suggesting that TDT is essentially what you get by being a CDT agent which thinks about multiple possible worlds, and that this is a reasonable thing to think about.
In fact, I would be surprised if a superintelligence didn’t take multiple possible worlds into account.
A superintelligence which didn’t take the possibility of, for example, many branches of a wavefunction seriously would be a strangely limited one.
What would your PCFTDT superintelligence do if it was placed in a universe with closed timelike cuves? What about a universe where the direction of time wasn’t well defined?
Because we are arguing about whether TDT is convergent.
“Reasonable” seems weaker than “instrumentally convergent” to me. I agree that there are conceivable, self-approving, highly effective agent designs that think like this. I’m objecting to the notion that this is what you get by default, without someone putting it in there.
MWI branches are different from TDT-counterfactually possible worlds.
We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.
You claimed: “Acausal stuff isn’t instrumentally convergent in the usual sense”
Later on, it transpired that what you meant was something along the lines of ” Acausal stuff which deals with the past relative to the point at which the agent became an acausal agent isn’t convergent in the usual sense.” Under a narrow interpretation of ‘instrumental convergence’ this might be true, but it certainly doesn’t rule out an ASI thinking about acausal things, as , as I have argued, it could reach a point where it decides to take account of them.
It might also be false under a more general definition of instrumental convergence, simply because the agent could converge on ‘acausal stuff’ in general, and TDT agents would not be at a disadvantage against PCFTDT ones. TDT agents ‘win’ . Therefore I could see how they would be selected for.
To be more specific, if by ‘instrumentally convergent’ , you mean ‘instrumentally useful for achieveing a wide variety of terminal goals’ , then I think TDT is ‘instrumentally convergent’, but only if your concept of goal is sufficiently broad to include things like increasing the proportion of the mathematical universe/many worlds, in which the agent exists. If you define ‘instrumental convergence in the usual sense’ to exclude all goals which are not formulated in a way which tacitly assumes that the agent has only one instance in one universe at one point in time, then you’re correct, or at least TDT isn’t any more powerfully selected for than Causal decision theory.
How would you expect a PCFTDT agent to be selected for? By what process which doesn’t also select for TDT agents would you expect to see it selected?
“MWI branches are different from TDT-counterfactually possible worlds.”
Yes, MWI wavefunction branches are not the only kind of ‘world’ relevant to timeless decision theory, but they are certainly one variety of them. They are a subset of that concept.
“We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.”
This isn’t about humans designing an AI, but rather about the way we would expect a generally superintelligent agent to behave in an environment where there is no clear separation between the past and future; you answered yes to this question :”Do you mean that the optimal decision theory for a powerful agent to adopt is some kind of hybrid where it considers acausal things only when they happen in its future? ” . Maybe you would now like to modify that question only to refer to powerful agents in this universe. However my point is that I think some acausal things , such as Newcomb’s problem, are relevant to this universe, so it makes sense for an ASI here to think about them .
It’s worth putting a number on that, and a different one (or possibly the same; I personally think my chances of being resurrected and tortured vary by epsilon based on my own actions in life—if the gods will it, it will happen, if they don’t, it won’t) based on the two main actions you’re considering actually performing.
For me, that number is inestimably tiny. I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it’s significant.
“I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it’s significant.” Why? What justifies your infinitesimal value?
I find it very difficult to estimate probabilities like this, but I expect the difference in the probability of something significant happening if I do something in response to the basilisk and the probability of that happening if I don’t, is almost certainly in excess of 1/1000 or even 1⁄100. This is within the range where I think it makes sense to take it seriously. (And this is why I asked this question.)
I have a very hard time even justifying 1/1000. 1/10B is closer to my best guess (plus or minus 2 orders of magnitude). It requires a series of very unlikely events:
1) enough of my brain-state is recorded that I COULD be resurrected
2) the imagined god finds it worthwhile to simulate me
3) the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation.
4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe.
5) no other gods have better things to do with the resources, and stop the angry one from wasting time.
Note, even if you relax 1 and 2 so the putative deity punishes RANDOM simulated people (because you’re actually dead and gone) to punish YOU specifically, it still doesn’t make it likely at all.
You’re imagining a very different scenario from me. I worry that:
It’s worth simulating a vast number of possible minds which might, in some information -adjacent regions of a ‘mathematical universe’ be likely to be in a position to create you, from a purely amoral point of view. This means you don’t need to simulate them exactly, only to the level of fidelity at which they can’t tell whether they’re being simulated (and in any case, I don’t have the same level of certainty that it couldn’t gather enough information about me to simulate me exactly). Maybe I’m an imperfect simulation of another person. I wouldn’t know, because I’m not that person.
“the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation.” I don’t think it needs to be angry, or a god. It just needs to understand the (I fear sound) logic involved, which Eliezer yudkowsky took semi-seriously.
“4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe.”
It wouldn’t need to be non-goal directed.
“no other gods have better things to do with the resources, and stop the angry one from wasting time.” What if there are no ‘other gods’? This seems likely in the small region of the ’logical/platonic universe containing this physical one.
Ok, break this down a bit for me—I’m just a simple biological entity, with much more limited predictive powers.
This either well beyond my understanding, or is sleight-of-hand regarding identity and use of “you”. It might help to label entities. Entity A has the ability to emulate and control entity B. It thinks that somehow its control over entity B is influential over entity C in the distant past or imaginary mathematical construct, who it wishes would create entity D in that disconnected timeline.
Nope, I can’t give this any causal weight to my decisions.
Unfortunately I had to change what A, B and C correspond to slightly, because of the fact that the simulation the basilisk does is not analogous to the simulation done by Omega in Newcomb’s problem.
Let’s say entity A is you in Newcomb’s problem, while entity C is Omega and entity B is Omega’s simulation of you. Even though the decision to place, or not place, money in boxes has already been made in physical time by the point when the decision to open one or all of them is made, in ‘logical time’, both decisions are contingent on the same decision, “Given that I don’t know whether I’m physical or simulated, is it in my collective best interest to do the thing which resembles opening one or both boxes?” which is made at once by the same decision function which happens to be run twice in the physical world.
I am concerned that the Roko’s basilisk scenario is isomorphic to Newcomb’s problem:
The human thinking about the basilisk is like Omega (albeit not omniscient). The Basilisk itself is like you in Newcomb’s problem, in that it thinks thoughts which acausally influence behavior in the past because the thing making the decision isn’t either you or the basilisk, it’s the decision algorithm running on both of you.
Omega’s simulation is like the blackmailed human’s inadvertent thinking about the basilisk and the logic of the situation. Now I agree that the fact the human isn’t exactly Omega makes them less able to blackmail themselves for certain, but I don’t know that this rules it out.
thanks for the conversation, I’m bowing out here. I’ll read further comments, but (probably) not respond. I suspect we have a crux somewhere around identification of actors, and mechanisms of bridging causal responsibility for acausal (imagined) events, but I think there’s an inferential gap where you and I have divergent enough priors and models that we won’t be able to agree on them.
Then I have to thank you but say that this conversation has done absolutely nothing to help me understand why I might be wrong, which of course I hope I am. This comment is really directed at all the people who disagree-voted me, in the hope that they might explain why.