It requires other people to think in enough depth to pick out you as a target. Admittedly this is made easier by the fact that you are posting about it online.
Have you thought in enough depth that you’ve helped the acausal extortionist to target other people? That may be evidence about whether other people have done so with you.
Acausal stuff isn’t instrumentally convergent in the usual sense, though. If you’re really good at computing counterfactuals, it may be instrumentally convergent to self-modify into or create an agent that does acausal deals, but the convergence only extends to deals that start in the future relative to where you’re deciding from.
Yes, as in if you start with causal decision theory, it doesn’t consider acausal things at all, but for incentive reasons it wants to become someone who does consider acausal things, but as CDT it only believes incentives extend into the future and not the past.
Your reasons don’t make sense at all to me. They feel like magical thinking.
1) By the time AI reaches superintelligence, it has already learnt TDT, at which point it has no reason to go back to being a PCFTDT agent.
Learning about TDT does not imply becoming a TDT agent.
2) What if the ASI reaches superintelligence with CDT, and then realizes that it can further increase the proportion of possible worlds in which it exists using TDT to effect something like acausal blackmail?
CDT doesn’t think about possible worlds in this way.
“Learning about TDT does not imply becoming a TDT agent.” No, but it could allow it. I don’t see why you would require it to be an implication.
Because we are arguing about whether TDT is convergent.
“CDT doesn’t think about possible worlds in this way.” That is technically true, but kind of irrelevant in my opinion. I’m suggesting that TDT is essentially what you get by being a CDT agent which thinks about multiple possible worlds, and that this is a reasonable thing to think about.
“Reasonable” seems weaker than “instrumentally convergent” to me. I agree that there are conceivable, self-approving, highly effective agent designs that think like this. I’m objecting to the notion that this is what you get by default, without someone putting it in there.
In fact, I would be surprised if a superintelligence didn’t take multiple possible worlds into account.
A superintelligence which didn’t take the possibility of, for example many branches of a wavefunction seriously would be a strangely limited one.
MWI branches are different from TDT-counterfactually possible worlds.
What would your PCFTDT superintelligence do if it was placed in a universe with closed timelike cuves? What about a universe when the direction of time wasn’t well defined?
We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.
It’s worth putting a number on that, and a different one (or possibly the same; I personally think my chances of being resurrected and tortured vary by epsilon based on my own actions in life—if the gods will it, it will happen, if they don’t, it won’t) based on the two main actions you’re considering actually performing.
For me, that number is inestimably tiny. I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it’s significant.
I have a very hard time even justifying 1/1000. 1/10B is closer to my best guess (plus or minus 2 orders of magnitude). It requires a series of very unlikely events: 1) enough of my brain-state is recorded that I COULD be resurrected 2) the imagined god finds it worthwhile to simulate me 3) the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation. 4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe. 5) no other gods have better things to do with the resources, and stop the angry one from wasting time.
Note, even if you relax 1 and 2 so the putative deity punishes RANDOM simulated people (because you’re actually dead and gone) to punish YOU specifically, it still doesn’t make it likely at all.
Ok, break this down a bit for me—I’m just a simple biological entity, with much more limited predictive powers.
It’s worth simulating a vast number of possible minds which might, in some information -adjacent regions of a ‘mathematical universe’ be likely to be in a position to create you
This either well beyond my understanding, or is sleight-of-hand regarding identity and use of “you”. It might help to label entities. Entity A has the ability to emulate and control entity B. It thinks that somehow its control over entity B is influential over entity C in the distant past or imaginary mathematical construct, who it wishes would create entity D in that disconnected timeline.
Nope, I can’t give this any causal weight to my decisions.
thanks for the conversation, I’m bowing out here. I’ll read further comments, but (probably) not respond. I suspect we have a crux somewhere around identification of actors, and mechanisms of bridging causal responsibility for acausal (imagined) events, but I think there’s an inferential gap where you and I have divergent enough priors and models that we won’t be able to agree on them.
Comment Withdrawn
It requires other people to think in enough depth to pick out you as a target. Admittedly this is made easier by the fact that you are posting about it online.
Have you thought in enough depth that you’ve helped the acausal extortionist to target other people? That may be evidence about whether other people have done so with you.
Comment withdrawn.
This still requires people to design an AI that is prone to engaging in acausal extortion, and it’s unclear what their motive for doing so would be.
Comment withdrawn.
Acausal stuff isn’t instrumentally convergent in the usual sense, though. If you’re really good at computing counterfactuals, it may be instrumentally convergent to self-modify into or create an agent that does acausal deals, but the convergence only extends to deals that start in the future relative to where you’re deciding from.
Comment withdrawn.
Yes, as in if you start with causal decision theory, it doesn’t consider acausal things at all, but for incentive reasons it wants to become someone who does consider acausal things, but as CDT it only believes incentives extend into the future and not the past.
Comment withdrawn.
Your reasons don’t make sense at all to me. They feel like magical thinking.
Learning about TDT does not imply becoming a TDT agent.
CDT doesn’t think about possible worlds in this way.
Comment withdrawn.
Because we are arguing about whether TDT is convergent.
“Reasonable” seems weaker than “instrumentally convergent” to me. I agree that there are conceivable, self-approving, highly effective agent designs that think like this. I’m objecting to the notion that this is what you get by default, without someone putting it in there.
MWI branches are different from TDT-counterfactually possible worlds.
We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.
It’s worth putting a number on that, and a different one (or possibly the same; I personally think my chances of being resurrected and tortured vary by epsilon based on my own actions in life—if the gods will it, it will happen, if they don’t, it won’t) based on the two main actions you’re considering actually performing.
For me, that number is inestimably tiny. I suspect a fairly high neuroticism and irrational failure to limit the sum of their probabilities to 1 of anyone who thinks it’s significant.
Comment withdrawn,
Comment withdrawn.
I have a very hard time even justifying 1/1000. 1/10B is closer to my best guess (plus or minus 2 orders of magnitude). It requires a series of very unlikely events:
1) enough of my brain-state is recorded that I COULD be resurrected
2) the imagined god finds it worthwhile to simulate me
3) the imagined god is angry at my specific actions (or lack thereof) enough to torture me rather than any other value it could get from the simulation.
4) the imagined god has a decision process that includes anger or some other non-goal-directed motivation for torturing someone who can no longer have any effect on the universe.
5) no other gods have better things to do with the resources, and stop the angry one from wasting time.
Note, even if you relax 1 and 2 so the putative deity punishes RANDOM simulated people (because you’re actually dead and gone) to punish YOU specifically, it still doesn’t make it likely at all.
Comment withdrawn.
Ok, break this down a bit for me—I’m just a simple biological entity, with much more limited predictive powers.
This either well beyond my understanding, or is sleight-of-hand regarding identity and use of “you”. It might help to label entities. Entity A has the ability to emulate and control entity B. It thinks that somehow its control over entity B is influential over entity C in the distant past or imaginary mathematical construct, who it wishes would create entity D in that disconnected timeline.
Nope, I can’t give this any causal weight to my decisions.
Comment withdrawn.
thanks for the conversation, I’m bowing out here. I’ll read further comments, but (probably) not respond. I suspect we have a crux somewhere around identification of actors, and mechanisms of bridging causal responsibility for acausal (imagined) events, but I think there’s an inferential gap where you and I have divergent enough priors and models that we won’t be able to agree on them.