Yes, as in if you start with causal decision theory, it doesn’t consider acausal things at all, but for incentive reasons it wants to become someone who does consider acausal things, but as CDT it only believes incentives extend into the future and not the past.
Your reasons don’t make sense at all to me. They feel like magical thinking.
1) By the time AI reaches superintelligence, it has already learnt TDT, at which point it has no reason to go back to being a PCFTDT agent.
Learning about TDT does not imply becoming a TDT agent.
2) What if the ASI reaches superintelligence with CDT, and then realizes that it can further increase the proportion of possible worlds in which it exists using TDT to effect something like acausal blackmail?
CDT doesn’t think about possible worlds in this way.
“Learning about TDT does not imply becoming a TDT agent.” No, but it could allow it. I don’t see why you would require it to be an implication.
Because we are arguing about whether TDT is convergent.
“CDT doesn’t think about possible worlds in this way.” That is technically true, but kind of irrelevant in my opinion. I’m suggesting that TDT is essentially what you get by being a CDT agent which thinks about multiple possible worlds, and that this is a reasonable thing to think about.
“Reasonable” seems weaker than “instrumentally convergent” to me. I agree that there are conceivable, self-approving, highly effective agent designs that think like this. I’m objecting to the notion that this is what you get by default, without someone putting it in there.
In fact, I would be surprised if a superintelligence didn’t take multiple possible worlds into account.
A superintelligence which didn’t take the possibility of, for example many branches of a wavefunction seriously would be a strangely limited one.
MWI branches are different from TDT-counterfactually possible worlds.
What would your PCFTDT superintelligence do if it was placed in a universe with closed timelike cuves? What about a universe when the direction of time wasn’t well defined?
We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.
Yes, as in if you start with causal decision theory, it doesn’t consider acausal things at all, but for incentive reasons it wants to become someone who does consider acausal things, but as CDT it only believes incentives extend into the future and not the past.
Comment withdrawn.
Your reasons don’t make sense at all to me. They feel like magical thinking.
Learning about TDT does not imply becoming a TDT agent.
CDT doesn’t think about possible worlds in this way.
Comment withdrawn.
Because we are arguing about whether TDT is convergent.
“Reasonable” seems weaker than “instrumentally convergent” to me. I agree that there are conceivable, self-approving, highly effective agent designs that think like this. I’m objecting to the notion that this is what you get by default, without someone putting it in there.
MWI branches are different from TDT-counterfactually possible worlds.
We don’t seem to live in a universe like that, so it would be silly to prioritize good behavior in such universes when designing an AI.