For any decision theory XDT, it is possible to construct a world where Omega gives you one bazillion utilons if you don’t follow XDT, and murders you if you do follow XDT. This is part of why these problems are called “unfair”.
That’s like saying that the Halting Problem isn’t an issue because problems that involve self-reference are unfair. You can’t just avoid the Halting Problem by saying “no explicit self-reference”, because seemingly reasonable stipulations that don’t explicitly have self-reference in them may imply it anyway.
It may turn out that for some decision theories, reasonable-seeming problems that don’t explicitly say “Omega punishes you if you follow XDT” may be equivalent to “Omega punishes you if you follow XDT” anyway.
That’s like saying that the Halting Problem isn’t an issue because problems that involve self-reference are unfair. You can’t just avoid the Halting Problem by saying “no explicit self-reference”, because seemingly reasonable stipulations that don’t explicitly have self-reference in them may imply it anyway.
If people hadn’t done roughly this, we would never have gotten the entire field of verifiable programs. Likewise, no-free-lunch theorems provide evidence that no brain can ever exist, and similar impossibility results show that GPT-style language models cannot exist (it’s impossible to learn the rules to a formal language from only positive examples of that language).
Ruling out a class of things as “unfair” or “unrealistic” is sometimes necessary. For the same reason that Godel’s incompleteness theorem shouldn’t stop you doing maths.
FDT is good because it works in the (relatively) un-contrived Newcomb’s problem which is equivalent to the totally un-contrived Parfit’s Hitchhiker problem. If you want to extend decision theory to problems where the agent is deceived, or is punished because of following a particular decision theory, you find that you need to do some mixture of
Accept more machinery, e.g. priors over possible settings, to account for deception, or P(you are in a world where FDTers get directly killed by Omega) vs P(you are in a world where CDTers get directly killed by Omega) and put together the computational complexity of different Omega scenarios and hash out the Solomonoff induction
Give up and have no decision theory be better than any other
If you want to extend decision theory to problems where the agent is deceived, or is punished because of following a particular decision theory,
The issue is that there may not be a choice. If you don’t want to extend decision theory to that kind of problem, exactly what are you going to do to exclude problems like that? It won’t necessarily have a line “if decision_theory == ‘XDT’” in it. It may not be very obvious, and it may not even be possible to determine, that some problem falls into a category that you want to exclude.
No, what I mean is that there’s a symmetry between the setup “Omega kills you if you follow FDT for the crime of following FDT” and the setup “Omega kills you if you follow CDT for the crime of following CDT” which isn’t necessarily present in setups like “Omega simulates you and then based on your actions in the simulation, does X or Y”. There’s other problems with the second kind of system in some cases if Omega is allowed to lie, since this also allows symmetry into the system.
You can set up a version of Newcomb’s problem where Omega never lies, but you can’t do that for e.g. Newcomb’s revenge, since in that case, Omega has to tell it’s simulation of you that you’re in the regular Newcomb’s problem.
Actually, the halting problem (well, its generalization, Rice’s theorem) allow you to get a more precise intuition for why punishing agents iff they follow XDT is ‘unfair’ (it would be Turing-uncomputable for Omega to decide if an agent follow XDT, even with his omniscience and infinite compute).
That’s like saying that the Halting Problem isn’t an issue because problems that involve self-reference are unfair. You can’t just avoid the Halting Problem by saying “no explicit self-reference”, because seemingly reasonable stipulations that don’t explicitly have self-reference in them may imply it anyway.
It may turn out that for some decision theories, reasonable-seeming problems that don’t explicitly say “Omega punishes you if you follow XDT” may be equivalent to “Omega punishes you if you follow XDT” anyway.
If people hadn’t done roughly this, we would never have gotten the entire field of verifiable programs. Likewise, no-free-lunch theorems provide evidence that no brain can ever exist, and similar impossibility results show that GPT-style language models cannot exist (it’s impossible to learn the rules to a formal language from only positive examples of that language).
Ruling out a class of things as “unfair” or “unrealistic” is sometimes necessary. For the same reason that Godel’s incompleteness theorem shouldn’t stop you doing maths.
FDT is good because it works in the (relatively) un-contrived Newcomb’s problem which is equivalent to the totally un-contrived Parfit’s Hitchhiker problem. If you want to extend decision theory to problems where the agent is deceived, or is punished because of following a particular decision theory, you find that you need to do some mixture of
Accept more machinery, e.g. priors over possible settings, to account for deception, or P(you are in a world where FDTers get directly killed by Omega) vs P(you are in a world where CDTers get directly killed by Omega) and put together the computational complexity of different Omega scenarios and hash out the Solomonoff induction
Give up and have no decision theory be better than any other
The issue is that there may not be a choice. If you don’t want to extend decision theory to that kind of problem, exactly what are you going to do to exclude problems like that? It won’t necessarily have a line “if decision_theory == ‘XDT’” in it. It may not be very obvious, and it may not even be possible to determine, that some problem falls into a category that you want to exclude.
No, what I mean is that there’s a symmetry between the setup “Omega kills you if you follow FDT for the crime of following FDT” and the setup “Omega kills you if you follow CDT for the crime of following CDT” which isn’t necessarily present in setups like “Omega simulates you and then based on your actions in the simulation, does X or Y”. There’s other problems with the second kind of system in some cases if Omega is allowed to lie, since this also allows symmetry into the system.
You can set up a version of Newcomb’s problem where Omega never lies, but you can’t do that for e.g. Newcomb’s revenge, since in that case, Omega has to tell it’s simulation of you that you’re in the regular Newcomb’s problem.
Actually, the halting problem (well, its generalization, Rice’s theorem) allow you to get a more precise intuition for why punishing agents iff they follow XDT is ‘unfair’ (it would be Turing-uncomputable for Omega to decide if an agent follow XDT, even with his omniscience and infinite compute).