“I should dodge the anvil because if I get destroyed I won’t do good things in the future”. She can still have a hard-coded aversion to being destroyed, but this seems like a hack.
I think it’s worse than that. If your argument is correct, the type of AI you are describing can’t plan because it can’t trust its future selves to follow through with the plan, even if doing so wouldn’t require commitment. It would be limited to immediate utility maximization.
It seems that your argument proves too much. In fact, I think that Lucy can still prove useful properties about its successors even without proving the consistency of their and therefore her own, proof system.
UDT assumes a system of assigning probabilities to mathematical sentences given a (possibly inconsistent) theory. For UDT to admit naturalistic trust, it needs that system to overcome the Loebian obstacle, like the system I propose here. Indeed the system I propose is intended as a component of UDT or UIM.
That’s not how UDT works in my understanding, even though I shall admit that I’m not an expert on the subject. Do you have a reference?
I think it’s worse than that. If your argument is correct, the type of AI you are describing can’t plan because it can’t trust its future selves to follow through with the plan, even if doing so wouldn’t require commitment.
We can avoid this problem if Lucy performs an action once it is the first in a provably “good” sequence of actions. This would allow her to dodge the anvil if it interferes with her immediate plans, but not on the general grounds of “a universe with Lucy is a better universe since Lucy is doing good things”.
That’s not how UDT works in my understanding, even though I shall admit that I’m not an expert on the subject. Do you have a reference?
I don’t have a reference which discusses UDT and the Loebian obstacle together. You can find a description of the AFAIK “latest and greatest” UDT here. UDT considers proofs in a formal system. If this system suffers from the Loebian obstacle this will lead to the kind of problems I discuss here. In fact, I haven’t stated it explicitly but I think of Lucy as a UDT agent: she considers possible actions as logical counterfactuals and computes expected utility based on that.
I think it’s worse than that. If your argument is correct, the type of AI you are describing can’t plan because it can’t trust its future selves to follow through with the plan, even if doing so wouldn’t require commitment. It would be limited to immediate utility maximization.
It seems that your argument proves too much. In fact, I think that Lucy can still prove useful properties about its successors even without proving the consistency of their and therefore her own, proof system.
That’s not how UDT works in my understanding, even though I shall admit that I’m not an expert on the subject. Do you have a reference?
We can avoid this problem if Lucy performs an action once it is the first in a provably “good” sequence of actions. This would allow her to dodge the anvil if it interferes with her immediate plans, but not on the general grounds of “a universe with Lucy is a better universe since Lucy is doing good things”.
I don’t have a reference which discusses UDT and the Loebian obstacle together. You can find a description of the AFAIK “latest and greatest” UDT here. UDT considers proofs in a formal system. If this system suffers from the Loebian obstacle this will lead to the kind of problems I discuss here. In fact, I haven’t stated it explicitly but I think of Lucy as a UDT agent: she considers possible actions as logical counterfactuals and computes expected utility based on that.