Consider Kant’s categorical imperative. It says, roughly, that you should act such that you could will your action as a universal law without undermining the intent of the action. For example, suppose you want to obtain a loan for a new car and never pay it back—you want to break a promise. In a world where everyone broke promises, the social practice of promise keeping wouldn’t exist and thus neither would the practice of giving out loans. So you would undermine your own ends and thus, according to the categorical imperative, you shouldn’t get a loan without the intent to pay it back.
Another way to put Kant’s position would be that you should choose such that you are choosing for all other rational agents. What does UDT tell you to do? It says (among other things) that you should choose such that you are choosing for every agent running the same decision algorithm as yourself. It wouldn’t be a stretch to call UDT agents rational. So Kant thinks we should be using UDT! Of course, Kant can’t draw the conclusions he wants to draw because no human is actually using UDT. But that doesn’t change the decision algorithm Kant is endorsing.
Except… Kant isn’t a consequentialist. If the categorical imperative demands something, it demands it no matter the circumstances. Kant famously argued that lying is wrong, period. Even if the fate of the world depends on it.
So Kant isn’t really endorsing UDT, but I thought the surface similarity was pretty funny.
Kant famously argued that lying is wrong, period. Even if the fate of the world depends on it.
I remember Eliezer saying something similar, though I can’t find it right now (the closest I could find was this ). It was something about the benefits of being the kind of person that doesn’t lie, even if the fate of the world is at stake. Because if you aren’t, the minute the fate of the world is at stake is the minute your word becomes worthless.
I recall it too. I think the key distinction is that if the choice was literally between lying and everyone in the world—including yourself—perishing, Kant would let us all die. Eliezer would not. What I took Eliezer to be saying (working from memory, I may try to find the post later) is that if you think the choice is between lying and the sun exploding (or something analogous) in any real life situation… you’re wrong. It’s far more likely that you’re rationalizing the way you’re compromising your values than that it’s actually necessary to compromise your values, given what we know about humans. So a consequentialist system implies basically deontological rules once human nature is taken into account.
Once again, this is all from my memory, so I could be wrong.
Although Eliezer didn’t put it precisely in these terms, he was sort of suggesting that if one could self-modify in such a way that it became impossible to break a certain sort of absolutely binding promise, it would be good to modify oneself in that way, even though it would mean that if the situation actually came up where you had to break the promise or let the world perish, you would have to let the world perish.
Drescher has some important things to say about this distinction in Good and Real. What I got out of it, is that the CI is justifiable on consequentialist or self-serving grounds, so long as you relax the constraint that you can only consider the causal consequences (or “means-end links”) of your decisions, i.e., things that happen “futureward” of your decision.
Drescher argues that specifically ethical behavior is distinguished by its recognition of these “acausal means-end links”, in which you act for the sake of what would be the case if-counterfactually you would make that decision, even though you may already know the result. (Though I may be butchering it—it’s tough to get my head around the arguments.)
And I saw a parallel between Drescher’s reasoning and UDT, as the former argues that your decisions set the output of all similar processes to the extent that they are similar.
I’m not familiar enough with Pearl’s formalism to really understand TDT—or at least that’s why I haven’t really dove into TDT yet. I’d love to hear why you think Kant sounds more like TDT though. I’m suspecting it has something to do with considering counterfactuals.
I’m not familiar at all with Pearl’s formalism. But from what I see on this site, I gather that the key insight of updateless decision theory is to maximize utility without conditioning on information about what world you’re in, and the key insight of timeless decision theory is what you’re describing (Eliezer summarizes it as “Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”)
I gather that the key insight of updateless decision theory is to maximize utility without conditioning on information about what world you’re in, and the key insight of timeless decision theory is what you’re describing (Eliezer summarizes it as “Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”)
I think Eliezer’s summary is also a fair description of UDT. The difference between UDT and TDT appears to be subtle, and I don’t completely understand it. From what I can tell, UDT just does choose in the way Eliezer describes, completely ignoring any updating process. TDT chooses this way as a result of how it reasons about counterfactuals. Somehow, TDT’s counterfactual reasoning causes it to choose slightly differently from UDT, but I’m not sure why at this point.
Was Kant implicitly using UDT?
Consider Kant’s categorical imperative. It says, roughly, that you should act such that you could will your action as a universal law without undermining the intent of the action. For example, suppose you want to obtain a loan for a new car and never pay it back—you want to break a promise. In a world where everyone broke promises, the social practice of promise keeping wouldn’t exist and thus neither would the practice of giving out loans. So you would undermine your own ends and thus, according to the categorical imperative, you shouldn’t get a loan without the intent to pay it back.
Another way to put Kant’s position would be that you should choose such that you are choosing for all other rational agents. What does UDT tell you to do? It says (among other things) that you should choose such that you are choosing for every agent running the same decision algorithm as yourself. It wouldn’t be a stretch to call UDT agents rational. So Kant thinks we should be using UDT! Of course, Kant can’t draw the conclusions he wants to draw because no human is actually using UDT. But that doesn’t change the decision algorithm Kant is endorsing.
Except… Kant isn’t a consequentialist. If the categorical imperative demands something, it demands it no matter the circumstances. Kant famously argued that lying is wrong, period. Even if the fate of the world depends on it.
So Kant isn’t really endorsing UDT, but I thought the surface similarity was pretty funny.
I remember Eliezer saying something similar, though I can’t find it right now (the closest I could find was this ). It was something about the benefits of being the kind of person that doesn’t lie, even if the fate of the world is at stake. Because if you aren’t, the minute the fate of the world is at stake is the minute your word becomes worthless.
I recall it too. I think the key distinction is that if the choice was literally between lying and everyone in the world—including yourself—perishing, Kant would let us all die. Eliezer would not. What I took Eliezer to be saying (working from memory, I may try to find the post later) is that if you think the choice is between lying and the sun exploding (or something analogous) in any real life situation… you’re wrong. It’s far more likely that you’re rationalizing the way you’re compromising your values than that it’s actually necessary to compromise your values, given what we know about humans. So a consequentialist system implies basically deontological rules once human nature is taken into account.
Once again, this is all from my memory, so I could be wrong.
Although Eliezer didn’t put it precisely in these terms, he was sort of suggesting that if one could self-modify in such a way that it became impossible to break a certain sort of absolutely binding promise, it would be good to modify oneself in that way, even though it would mean that if the situation actually came up where you had to break the promise or let the world perish, you would have to let the world perish.
I think the article you (and the parent comment) are talking about is this one
Drescher has some important things to say about this distinction in Good and Real. What I got out of it, is that the CI is justifiable on consequentialist or self-serving grounds, so long as you relax the constraint that you can only consider the causal consequences (or “means-end links”) of your decisions, i.e., things that happen “futureward” of your decision.
Drescher argues that specifically ethical behavior is distinguished by its recognition of these “acausal means-end links”, in which you act for the sake of what would be the case if-counterfactually you would make that decision, even though you may already know the result. (Though I may be butchering it—it’s tough to get my head around the arguments.)
And I saw a parallel between Drescher’s reasoning and UDT, as the former argues that your decisions set the output of all similar processes to the extent that they are similar.
I thought Kant sounded a lot more like TDT than UDT. Or was that what you meant?
I’m not familiar enough with Pearl’s formalism to really understand TDT—or at least that’s why I haven’t really dove into TDT yet. I’d love to hear why you think Kant sounds more like TDT though. I’m suspecting it has something to do with considering counterfactuals.
I’m not familiar at all with Pearl’s formalism. But from what I see on this site, I gather that the key insight of updateless decision theory is to maximize utility without conditioning on information about what world you’re in, and the key insight of timeless decision theory is what you’re describing (Eliezer summarizes it as “Choose as though controlling the logical output of the abstract computation you implement, including the output of all other instantiations and simulations of that computation.”)
I think Eliezer’s summary is also a fair description of UDT. The difference between UDT and TDT appears to be subtle, and I don’t completely understand it. From what I can tell, UDT just does choose in the way Eliezer describes, completely ignoring any updating process. TDT chooses this way as a result of how it reasons about counterfactuals. Somehow, TDT’s counterfactual reasoning causes it to choose slightly differently from UDT, but I’m not sure why at this point.