A chooser of decisions is an algorithm, which is analogous to 71+12 being an algorithm. It’s instantiated on some kind of substrate, such as a physical body of a human, or a datacenter, similarly to how the computation of 71+12 gets to be instantiated in the form of a calculator. The more fundamental thing doing the choosing is the choosing-algorithm (rather than a physical body of a human or a datacenter), similarly to how the thing that determines 83 as the right answer is the abstract fact of arithmetic that 71+12=83 (rather than a particular calculator).
I see what you’re saying now, but I think that’s an exotic interpretation of decision-theoretic problems.
The classical framing is: “you’re faced with options A, B, C… at time t, which one should you pick?”
This is different from “what algorithm would you prefer to implement, one that picks A, B, or C at time t?”.
Specifically, picking an algorithm is like allowing yourself to make a decision/pre-commitment before time t, which wasn’t allowed in the original problem.
You are not picking an algorithm. The point is that the picker is an algorithm (already, and it’s already “picked”), you are an algorithm, and that algorithm picks the action. Even when an algorithm gets no further external input, and its result is fully determined by the algorithm and so can’t change, its result is still determined by the algorithm and not by other things, it’s the algorithm that chooses the result, the result isn’t chosen (logically) before the algorithm chooses it. Since an algorithm is an abstract mathematical thing, the fact that its instances are present at particular times and places within some physical world is not relevant to the algorithm-determines-its-result point.
I think that’s an exotic interpretation of decision-theoretic problems
(What I’m sketching is a standard MIRI/LW way of seeing this.)
you are an algorithm, and that algorithm picks the action. Even when an algorithm gets no further external input, and its result is fully determined by the algorithm and so can’t change, its result is still determined by the algorithm and not by other things, it’s the algorithm that chooses the result
This notion of “choice”, though perhaps reasonable in some cases, seems incorrect for decision theory, where there idea is that, until the point when you make the decision, you could (logically and physically) go for some other option.
If you think of yourself as carrying out a predetermined algorithm, there’s no choice or decision to make or discuss. Maybe, in some sense “you decided” to one-box but this is just a question of definitions. You could not have decided otherwise, making all questions of “what should you decide” moot.
Further, if you’re even slightly unsure whether you’re carrying out a predetermined algorithm, you can apply the logic I discussed:
If my actions are predetermined by an algorithm, it doesn’t matter what I feel like I’m “choosing”
If my actions are not predetermined by an algorithm, I should choose whatever improves my current position (ie. two-box)
What I’m sketching is a standard MIRI/LW way of seeing this
Sure. But as far as I can tell, that way of seeing decision theory is denying the notion of real choice in the moment and saying “actually all there is is what type of algorithm you are and it’s best to be a UDT/FDT/etc. algorithm.”
But no-one is arguing that being a hardcoded one-boxer is worse than being a hardcoded two-boxer.
The nontrivial question is how do you become an FDT agent? The MIRI view implies you can choose to be an FDT agent so that you win in Newcomb’s. But how? The only way this could be true is if you can pre-commit to one-boxing etc. before the fact, or if you started out on Earth as an FDT agent. Since most people can rule out the latter empirically, we’re back to square one, where MIRI is smuggling in the ability to pre-commit to a problem that doesn’t allow it.
Consider what “choice” centrally is, in the usual senses, when a person makes a choice. If you upload that person to a datacenter, and have them “make a choice” in that same sense while running on computers as an algorithm, you get all these properties of “not having been able to choose otherwise” just as well. This illustrates that choice isn’t about “being able to choose otherwise” in the sense that’s not present in a datacenter, it’s about deciding what to choose, the happenings within an algorithm that computes the choice. The algorithm is predetermined, but that doesn’t make the algorithm’s result/choice/decision already known to the algorithm, and so the algorithm still has to do all the choosing/deciding the hard way, the algorithm being predetermined doesn’t affect the details of the process of how that algorithm goes about computing its result. If that algorithm is a person, computing the result is centrally what it means for that person to think and choose/decide, in all the usual senses.
Even when an algorithm is fixed, and there are predictors that already know what its result is, the algorithm itself doesn’t necessarily know what its result is before it decides what it’s going to be. If the algorithm somehow gets to know what the result is, it can actually produce a different result on purpose, diagonalizing this knowledge, thereby making it invalid. If the agent knows that the result is 5, it can choose 6 on purpose, thereby by construction breaking any claims that it’s 5, no matter how authoritative the sources for the claims that it’s 5. So any predetermined-ness doesn’t affect the computation that’s making the decision, it only exists as an abstract fact about omniscient predictors who don’t get to communicate their knowledge to sufficiently quarrelous agents on pain of losing it, and it doesn’t affect the actual process of making a decision. Also, the decision in particular determines what all the omniscient predictors believe.
The more traditional framing talks about sufficiently deterministic laws of physics, and asks how choice works there. There is something about “requiredism” in the sequences (a sense in which free will, in the usual central sense it’s used in everyday life, isn’t just compatible with lawfulness and predictableness of physics, but requires it). That runs into technical problems with decision theory, which makes TDT behave incorrectly in some cases, but gets resolved in FDT/UDT as a result of moving from an agent existing within physics to an agent existing as an abstract computation, running/coordinating things from the mathematical realm through all the instances and reasonings-about that occur across the physical world.
So as I understand it, your (and the MIRI/LW) frame is that:
A choice is not made, it is “discovered” (ie. the choice the agent is determined to make is revealed to it after its computation of a certain “choice-making” procedure). This process is internally indistinguishable from “actually choosing” because we couldn’t know the result of the choice-making computation before doing it. However, an external system *could* know this, for example by simulating us.
There are certain choices we should be more or less happy to discover, or “make” in this sense. We should be more happy to have choice-making procedures that result in happy choices.
Correct decision theory specifies the best choice-making procedures.
I think the issue here is with moving down a level of abstraction to a new “map” in a way that makes the entire ontology of decision theory meaningless.
Yes, on some level, we are just atoms following the laws of physics. There are no “agents”, “identities”, or “decisions”. We can just talk about which configurations of atoms we prefer, and agree that we prefer the configuration of atoms where we get more money.
This is not the correct level for thinking about decision theory—we don’t think about any of our decisions that way. Decision theory is about determining the output of the specific choice-making procedure “consider all available options and pick the best one in the moment”. This is the only sense in which we appear to make choices—insofar as we make choices, those choices are over actions.
A choice is not made, it is “discovered” … choices we should be more or less happy to discover
The decision procedure is what’s making the choices. The diagonalization example was meant to illustrate that even an oracle predicts only at the pleasure of the decision procedure, and only the decision procedure gets to determine the choice. Nothing else gets to dictate to the decision procedure what the choice it, it’s not following a predetermined destiny, instead the destiny has no choice but obey the decision procedure. Also, decision procedures are us, we are the decision procedures when making our own choices, the decision procedures are not some external additional things.
Decision theory is about determining the output of the specific choice-making procedure “consider all available options and pick the best one in the moment”.
Sounds like a reasonable decision procedure. Except most of the options are inevitably not actually chosen, not what gets to actually happen, that’s just how it is. You get to choose which one is actual, and you are free to do so as you wish, since nothing but you determines which one that is, and all the oracles and laws of physics and transistors have to comply with whatever you choose (because that’s just what it means to predict/instantiate/execute you correctly).
This is not the correct level for thinking about decision theory—we don’t think about any of our decisions that way. Decision theory is about determining the output of the specific choice-making procedure “consider all available options and pick the best one in the moment”.
Act only according to that maxim whereby you can at the same time will that it should become a universal law.
I don’t think this is incompatible with making the best decision in the moment. You just decide in the moment to go with the more sophisticated version of the categorical imperative, because that seems best?
If I didn’t reason like this, I would not vote and I would have a harder time to stick to commitments.
I agree thinking about decisions in a way that is not purely greedy is complicated.
Categorical imperative has been popular for a long while
I think Rationalists have stumbled into reasonable beliefs about good strategies for iterated games/situations where reputation matters and people learn about your actions. But you don’t need exotic decision theories for that.
I address this in the post:
...makes sense under two conditions:
Their cooperative actions directly cause desirable outcomes by making observers think they are trustworthy/cooperative.
Being deceptive is too costly, either because it’s literally difficult (requires too much planning/thought), or because it makes future deception impossible (e.g. because of reputation and repeated interactions).
Of course, whether or not we have some free will, we are not entirely free—some actions are outside of our capability. Being sufficiently good at deception may be one of these. Hence why one might rationally decide to always be honest and cooperative—successfully only pretending to be so when others are watching might be literally impossible (and messing up once might be very costly).
How does your purely causal framing escape backward induction? Pure CDT agents defect in the iterated version of the prisoners’ dilemma, too. Since at the last time step you wouldn’t care about your reputation.
In conclusion, if you find yourself freely choosing between options, it’s rational to take a dominating strategy, like two-boxing in Newcomb’s problem, or defecting in the sorted prisoner’s dilemma. However, given the opportunity to actually pre-commit to decisions that get you better outcomes provided your pre-commitment, you should do so.
How do you tell if you are in a “pre-commitment” or in a defecting situation?
A chooser of decisions is an algorithm, which is analogous to 71+12 being an algorithm. It’s instantiated on some kind of substrate, such as a physical body of a human, or a datacenter, similarly to how the computation of 71+12 gets to be instantiated in the form of a calculator. The more fundamental thing doing the choosing is the choosing-algorithm (rather than a physical body of a human or a datacenter), similarly to how the thing that determines 83 as the right answer is the abstract fact of arithmetic that 71+12=83 (rather than a particular calculator).
I see what you’re saying now, but I think that’s an exotic interpretation of decision-theoretic problems.
The classical framing is: “you’re faced with options A, B, C… at time t, which one should you pick?”
This is different from “what algorithm would you prefer to implement, one that picks A, B, or C at time t?”.
Specifically, picking an algorithm is like allowing yourself to make a decision/pre-commitment before time t, which wasn’t allowed in the original problem.
You are not picking an algorithm. The point is that the picker is an algorithm (already, and it’s already “picked”), you are an algorithm, and that algorithm picks the action. Even when an algorithm gets no further external input, and its result is fully determined by the algorithm and so can’t change, its result is still determined by the algorithm and not by other things, it’s the algorithm that chooses the result, the result isn’t chosen (logically) before the algorithm chooses it. Since an algorithm is an abstract mathematical thing, the fact that its instances are present at particular times and places within some physical world is not relevant to the algorithm-determines-its-result point.
(What I’m sketching is a standard MIRI/LW way of seeing this.)
This notion of “choice”, though perhaps reasonable in some cases, seems incorrect for decision theory, where there idea is that, until the point when you make the decision, you could (logically and physically) go for some other option.
If you think of yourself as carrying out a predetermined algorithm, there’s no choice or decision to make or discuss. Maybe, in some sense “you decided” to one-box but this is just a question of definitions. You could not have decided otherwise, making all questions of “what should you decide” moot.
Further, if you’re even slightly unsure whether you’re carrying out a predetermined algorithm, you can apply the logic I discussed:
If my actions are predetermined by an algorithm, it doesn’t matter what I feel like I’m “choosing”
If my actions are not predetermined by an algorithm, I should choose whatever improves my current position (ie. two-box)
Sure. But as far as I can tell, that way of seeing decision theory is denying the notion of real choice in the moment and saying “actually all there is is what type of algorithm you are and it’s best to be a UDT/FDT/etc. algorithm.”
But no-one is arguing that being a hardcoded one-boxer is worse than being a hardcoded two-boxer.
The nontrivial question is how do you become an FDT agent? The MIRI view implies you can choose to be an FDT agent so that you win in Newcomb’s. But how? The only way this could be true is if you can pre-commit to one-boxing etc. before the fact, or if you started out on Earth as an FDT agent. Since most people can rule out the latter empirically, we’re back to square one, where MIRI is smuggling in the ability to pre-commit to a problem that doesn’t allow it.
Consider what “choice” centrally is, in the usual senses, when a person makes a choice. If you upload that person to a datacenter, and have them “make a choice” in that same sense while running on computers as an algorithm, you get all these properties of “not having been able to choose otherwise” just as well. This illustrates that choice isn’t about “being able to choose otherwise” in the sense that’s not present in a datacenter, it’s about deciding what to choose, the happenings within an algorithm that computes the choice. The algorithm is predetermined, but that doesn’t make the algorithm’s result/choice/decision already known to the algorithm, and so the algorithm still has to do all the choosing/deciding the hard way, the algorithm being predetermined doesn’t affect the details of the process of how that algorithm goes about computing its result. If that algorithm is a person, computing the result is centrally what it means for that person to think and choose/decide, in all the usual senses.
Even when an algorithm is fixed, and there are predictors that already know what its result is, the algorithm itself doesn’t necessarily know what its result is before it decides what it’s going to be. If the algorithm somehow gets to know what the result is, it can actually produce a different result on purpose, diagonalizing this knowledge, thereby making it invalid. If the agent knows that the result is 5, it can choose 6 on purpose, thereby by construction breaking any claims that it’s 5, no matter how authoritative the sources for the claims that it’s 5. So any predetermined-ness doesn’t affect the computation that’s making the decision, it only exists as an abstract fact about omniscient predictors who don’t get to communicate their knowledge to sufficiently quarrelous agents on pain of losing it, and it doesn’t affect the actual process of making a decision. Also, the decision in particular determines what all the omniscient predictors believe.
The more traditional framing talks about sufficiently deterministic laws of physics, and asks how choice works there. There is something about “requiredism” in the sequences (a sense in which free will, in the usual central sense it’s used in everyday life, isn’t just compatible with lawfulness and predictableness of physics, but requires it). That runs into technical problems with decision theory, which makes TDT behave incorrectly in some cases, but gets resolved in FDT/UDT as a result of moving from an agent existing within physics to an agent existing as an abstract computation, running/coordinating things from the mathematical realm through all the instances and reasonings-about that occur across the physical world.
So as I understand it, your (and the MIRI/LW) frame is that:
A choice is not made, it is “discovered” (ie. the choice the agent is determined to make is revealed to it after its computation of a certain “choice-making” procedure). This process is internally indistinguishable from “actually choosing” because we couldn’t know the result of the choice-making computation before doing it. However, an external system *could* know this, for example by simulating us.
There are certain choices we should be more or less happy to discover, or “make” in this sense. We should be more happy to have choice-making procedures that result in happy choices.
Correct decision theory specifies the best choice-making procedures.
I think the issue here is with moving down a level of abstraction to a new “map” in a way that makes the entire ontology of decision theory meaningless.
Yes, on some level, we are just atoms following the laws of physics. There are no “agents”, “identities”, or “decisions”. We can just talk about which configurations of atoms we prefer, and agree that we prefer the configuration of atoms where we get more money.
This is not the correct level for thinking about decision theory—we don’t think about any of our decisions that way. Decision theory is about determining the output of the specific choice-making procedure “consider all available options and pick the best one in the moment”. This is the only sense in which we appear to make choices—insofar as we make choices, those choices are over actions.
The decision procedure is what’s making the choices. The diagonalization example was meant to illustrate that even an oracle predicts only at the pleasure of the decision procedure, and only the decision procedure gets to determine the choice. Nothing else gets to dictate to the decision procedure what the choice it, it’s not following a predetermined destiny, instead the destiny has no choice but obey the decision procedure. Also, decision procedures are us, we are the decision procedures when making our own choices, the decision procedures are not some external additional things.
Sounds like a reasonable decision procedure. Except most of the options are inevitably not actually chosen, not what gets to actually happen, that’s just how it is. You get to choose which one is actual, and you are free to do so as you wish, since nothing but you determines which one that is, and all the oracles and laws of physics and transistors have to comply with whatever you choose (because that’s just what it means to predict/instantiate/execute you correctly).
Categorical imperative has been popular for a long while:
I don’t think this is incompatible with making the best decision in the moment. You just decide in the moment to go with the more sophisticated version of the categorical imperative, because that seems best? If I didn’t reason like this, I would not vote and I would have a harder time to stick to commitments. I agree thinking about decisions in a way that is not purely greedy is complicated.
I think Rationalists have stumbled into reasonable beliefs about good strategies for iterated games/situations where reputation matters and people learn about your actions. But you don’t need exotic decision theories for that.
I address this in the post:
How does your purely causal framing escape backward induction? Pure CDT agents defect in the iterated version of the prisoners’ dilemma, too. Since at the last time step you wouldn’t care about your reputation.
How do you tell if you are in a “pre-commitment” or in a defecting situation?
It’s not exotic at all. It’s just a compatibilist interpretation of the term “free will”, which form a pretty major class of positions on the subject.