I believe that the section on decision theory is somewhat misguided in several ways. Specifically, I don’t perceive FDT as a critical error. However, I should note that I’m not an expert on decision theory, so please consider my opinion with a grain of salt.
(I generally agree with the statements “Eliezer is excessively overconfident” and “Eliezer has a poor epistemic track record”. Specifically, I believe that Eliezer holds several incorrect and overconfident beliefs about AI, which, from my perspective, seem like significant mistakes. However, I also believe that Eliezer has a commendable track record of intellectual outputs overall, just not a strong epistemic or predictive one. And, I think that FDT seems like a reasonable intellectual contribution and is perhaps our best guess at what decision theory looks like for optimal agents.)
I won’t spend much time advocating for FDT, but I will address a few specific points.
Schwartz argues the first problem with the view is that it gives various totally insane recommendations. One example is a blackmail case. Suppose that a blackmailer will, every year, blackmail one person. There’s a 1 in a googol chance that he’ll blackmail someone who wouldn’t give in to the blackmail and a (googol-1)/googol chance that he’ll blackmail someone who would give in to the blackmail. He has blackmailed you. He threatens that if you don’t give him a dollar, he will share all of your most embarrassing secrets to everyone in the world. Should you give in?
FDT would say no. After all, agents who won’t give in are almost guaranteed to never be blackmailed. But this is totally crazy. You should give up one dollar to prevent all of your worst secrets from being spread to the world. As Schwartz says:
FDT says you should not pay because, if you were the kind of person who doesn’t pay, you likely wouldn’t have been blackmailed. How is that even relevant? You are being blackmailed. Not being blackmailed isn’t on the table. It’s not something you can choose.
(I think you flipped the probabilities in the original post. I flipped them to what I think is correct in this block quote.)
I believe that what FDT does here is entirely reasonable. The reason it may seem unreasonable is because we’re assuming extreme levels of confidence. It seems unlikely that you shouldn’t succumb to the blackmail, but all of that improbability resides in the 1/googol probabilities you proposed. This hypothetical also assumes that the FDT reasoner assigns a 100% probability to always following FDT in any counterfactual, an assumption that can probably be relaxed (though this may be challenging due to unresolved issues in decision theory?).
For an intuitive understanding of why this is reasonable, imagine the blackmailer simulates you to understand your behavior, and you’re almost certain they don’t blackmail people who ignore blackmail in the simulation. Then, when you’re blackmailed, your epistemic state should be “Oh, I’m clearly in a simulation. I won’t give in so that my real-world self doesn’t get blackmailed.” This seems intuitively reasonable to me, and it’s worth noting that Causal Decision Theory (CDT) would do the same, provided you don’t have indexical preferences. The difference is that FDT doesn’t differentiate between simulation and other methods of reasoning about your decision algorithm.
In fact, I find it absurd that CDT places significant importance on whether entities reasoning about the CDT reasoner will use simulation or some other reasoning method; intuitively, this seems nonsensical!
The basic point is that Yudkowsky’s decision theory is completely bankrupt and implausible, in ways that are evident to those who know about decision theory. It is much worse than either evidential or causal decision theory.
I think it’s worth noting that both evidential decision theory (EDT) and causal decision theory (CDT) seem quite implausible to me. Optimal agents following either decision theory would self-modify into something else to perform better in scenarios like Transparent Newcomb’s Problem.
I think decision theories are generally at least counterintuitive, so this isn’t a unique problem with FDT.
Your points I think are both addressed by the point MacAskill makes that, perhaps in some cases it’s best to be the type of agent that follows functional decision theory. Sometimes rationality will be bad for you—if there’s a demon who tortures all rational people, for example. And as Schwarz points out, in the twin case, you’ll get less utility by following FDT—you don’t always want to be a FDTist.
I find your judgment about the blackmail case crazy! Yes, agents who give in to blackmail do worse on average. Yes, you want to be the kind of agent who never gives in to blackmail. But all of those are consistent with the obvious truth that giving into blackmail, once you’re in that scenario, makes things worse for you and is clearly irrational.
Sometimes rationality will be bad for you—if there’s a demon who tortures all rational people, for example
At some point this gets down to semantics. I think a reasonable question to answer is “what decision rule should be chosen by an engineer who wants to build an agent scoring the most utility across its lifetime?” (quoting from Schwarz). I’m not sure if the answer to this question is well described as rationality, but it seems like a good question to answer to me. (FDT is sort of an attempted answer to this question if you define “decision rule” somewhat narrowly.)
Suppose that I beat up all rational people so that they get less utility. This would not make rationality irrational. It would just mean that the world is bad for the rational. The question you’ve described might be a fine one, but it’s not what philosophers are arguing about in Newcombe’s problem. If Eliezer claims to have revolutionized decision theory, and then doesn’t even know enough about decision theory to know that he is answering a different question from the decision theorists, that is an utter embarrassment that significantly undermines his credibility.
And in that case, Newcombe’s problem becomes trivial. Of course if Newcombe’s problem comes up a lot, you should design agents that one box—they get more average utility. The question is about what’s rational for the agent to do, not what’s rational for it to commit to, become, or what’s rational for its designers to do.
And as Schwarz points out, in the twin case, you’ll get less utility by following FDT—you don’t always want to be a FDTist.
I can’t seem to find this in the linked blog post. (I see discussion of the twin case, but not a case where you get less utility from precommiting to follow FDT at the start of time.)
I find your judgment about the blackmail case crazy!
What about the simulation case? Do you think CDT with non-indexical preferences is crazy here also?
More generally, do you find the idea of legible precommitment to be crazy?
Sorry, I said twin case, I meant the procreation case!
The simulation case seems relevantly like the normal twin case which I’m not as sure about.
Legible precommitment is not crazy! Sometimes, it is rational to agree to do the irrational thing in some case. If you have the ability to make it so that you won’t later change your mind, you should do that. But once you’re in that situation, it makes sense to defect.
As far as I can tell, the procreation case isn’t defined well enough in Schwarz for me to enage with it. In particular, in what exact way are the decision of my father and I entangled? (Just saying the father follows FDT isn’t enough.) But, I do think there is going to be a case basically like this where I bite the bullet here. Noteably, so does EDT.
That would mean that believed he had a father with the same reasons, who believed he had a father with the same reasons, who believed he had a father with the same reasons...
I.e., this would require an infinite line of forefathers. (Or at least of hypothetical, believed-in forefathers.)
If anywhere there’s a break in the chain — that person would not have FDT reasons to reproduce, so neither would their son, etc.
Which makes it disanalogous from any cases we encounter in real life. And makes me more sympathetic to the FDT reasoning, since it’s a stranger case where I have less strong pre-existing intuitions.
...which makes the Procreation case an unfair problem. It punishes FDT’ers specifically for following FDT. If we’re going to punish decision theories for their identity, no decision theory is safe. It’s pretty wild to me that @WolfgangSchwarz either didn’t notice this or doesn’t think it’s a problem.
A more fair version of Procreation would be what I have called Procreation*, where your father follows the same decision theory as you (be it FDT, CDT or whatever).
Cool, so you maybe agree that CDT agents would want to self modify into something like FDT agents (if they could). Then I suppose we might just disagree on the semantics behind the word rational.
(Note that CDT agents don’t exactly self-modify into FDT agents, just something close.)
I believe that the section on decision theory is somewhat misguided in several ways. Specifically, I don’t perceive FDT as a critical error. However, I should note that I’m not an expert on decision theory, so please consider my opinion with a grain of salt.
(I generally agree with the statements “Eliezer is excessively overconfident” and “Eliezer has a poor epistemic track record”. Specifically, I believe that Eliezer holds several incorrect and overconfident beliefs about AI, which, from my perspective, seem like significant mistakes. However, I also believe that Eliezer has a commendable track record of intellectual outputs overall, just not a strong epistemic or predictive one. And, I think that FDT seems like a reasonable intellectual contribution and is perhaps our best guess at what decision theory looks like for optimal agents.)
I won’t spend much time advocating for FDT, but I will address a few specific points.
(I think you flipped the probabilities in the original post. I flipped them to what I think is correct in this block quote.)
I believe that what FDT does here is entirely reasonable. The reason it may seem unreasonable is because we’re assuming extreme levels of confidence. It seems unlikely that you shouldn’t succumb to the blackmail, but all of that improbability resides in the 1/googol probabilities you proposed. This hypothetical also assumes that the FDT reasoner assigns a 100% probability to always following FDT in any counterfactual, an assumption that can probably be relaxed (though this may be challenging due to unresolved issues in decision theory?).
For an intuitive understanding of why this is reasonable, imagine the blackmailer simulates you to understand your behavior, and you’re almost certain they don’t blackmail people who ignore blackmail in the simulation. Then, when you’re blackmailed, your epistemic state should be “Oh, I’m clearly in a simulation. I won’t give in so that my real-world self doesn’t get blackmailed.” This seems intuitively reasonable to me, and it’s worth noting that Causal Decision Theory (CDT) would do the same, provided you don’t have indexical preferences. The difference is that FDT doesn’t differentiate between simulation and other methods of reasoning about your decision algorithm.
In fact, I find it absurd that CDT places significant importance on whether entities reasoning about the CDT reasoner will use simulation or some other reasoning method; intuitively, this seems nonsensical!
I think it’s worth noting that both evidential decision theory (EDT) and causal decision theory (CDT) seem quite implausible to me. Optimal agents following either decision theory would self-modify into something else to perform better in scenarios like Transparent Newcomb’s Problem.
I think decision theories are generally at least counterintuitive, so this isn’t a unique problem with FDT.
Your points I think are both addressed by the point MacAskill makes that, perhaps in some cases it’s best to be the type of agent that follows functional decision theory. Sometimes rationality will be bad for you—if there’s a demon who tortures all rational people, for example. And as Schwarz points out, in the twin case, you’ll get less utility by following FDT—you don’t always want to be a FDTist.
I find your judgment about the blackmail case crazy! Yes, agents who give in to blackmail do worse on average. Yes, you want to be the kind of agent who never gives in to blackmail. But all of those are consistent with the obvious truth that giving into blackmail, once you’re in that scenario, makes things worse for you and is clearly irrational.
At some point this gets down to semantics. I think a reasonable question to answer is “what decision rule should be chosen by an engineer who wants to build an agent scoring the most utility across its lifetime?” (quoting from Schwarz). I’m not sure if the answer to this question is well described as rationality, but it seems like a good question to answer to me. (FDT is sort of an attempted answer to this question if you define “decision rule” somewhat narrowly.)
Suppose that I beat up all rational people so that they get less utility. This would not make rationality irrational. It would just mean that the world is bad for the rational. The question you’ve described might be a fine one, but it’s not what philosophers are arguing about in Newcombe’s problem. If Eliezer claims to have revolutionized decision theory, and then doesn’t even know enough about decision theory to know that he is answering a different question from the decision theorists, that is an utter embarrassment that significantly undermines his credibility.
And in that case, Newcombe’s problem becomes trivial. Of course if Newcombe’s problem comes up a lot, you should design agents that one box—they get more average utility. The question is about what’s rational for the agent to do, not what’s rational for it to commit to, become, or what’s rational for its designers to do.
I can’t seem to find this in the linked blog post. (I see discussion of the twin case, but not a case where you get less utility from precommiting to follow FDT at the start of time.)
What about the simulation case? Do you think CDT with non-indexical preferences is crazy here also?
More generally, do you find the idea of legible precommitment to be crazy?
Sorry, I said twin case, I meant the procreation case!
The simulation case seems relevantly like the normal twin case which I’m not as sure about.
Legible precommitment is not crazy! Sometimes, it is rational to agree to do the irrational thing in some case. If you have the ability to make it so that you won’t later change your mind, you should do that. But once you’re in that situation, it makes sense to defect.
As far as I can tell, the procreation case isn’t defined well enough in Schwarz for me to enage with it. In particular, in what exact way are the decision of my father and I entangled? (Just saying the father follows FDT isn’t enough.) But, I do think there is going to be a case basically like this where I bite the bullet here. Noteably, so does EDT.
Your father followed FDT and had the same reasons to procreate as you. He is relevantly like you.
That would mean that believed he had a father with the same reasons, who believed he had a father with the same reasons, who believed he had a father with the same reasons...
I.e., this would require an infinite line of forefathers. (Or at least of hypothetical, believed-in forefathers.)
If anywhere there’s a break in the chain — that person would not have FDT reasons to reproduce, so neither would their son, etc.
Which makes it disanalogous from any cases we encounter in real life. And makes me more sympathetic to the FDT reasoning, since it’s a stranger case where I have less strong pre-existing intuitions.
...which makes the Procreation case an unfair problem. It punishes FDT’ers specifically for following FDT. If we’re going to punish decision theories for their identity, no decision theory is safe. It’s pretty wild to me that @WolfgangSchwarz either didn’t notice this or doesn’t think it’s a problem.
A more fair version of Procreation would be what I have called Procreation*, where your father follows the same decision theory as you (be it FDT, CDT or whatever).
Cool, so you maybe agree that CDT agents would want to self modify into something like FDT agents (if they could). Then I suppose we might just disagree on the semantics behind the word rational.
(Note that CDT agents don’t exactly self-modify into FDT agents, just something close.)