DanielW comments on DanielW’s Shortform

DanielW 16 Mar 2026 1:42 UTC
2 points
0
I mean, my intuition says that the right utility function can turn any CDT agent into an FDT agent, and any FDT agent can be described in terms of a CDT agent with a certain utility function.
I don’t think that follows. The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn’t seem very sensible, imo.
So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,
In that case, Derek could demand infinite dollars from Will and Will would pay it.
This seems like the crux of the issue. I think that the whole reason Will is in this mess is because Derek places a value of $1,000,000 on trust for some reason, making him act exactly like an FDT agent
If you remove Derek valuing honesty, his optimal decisions work out identically, as I daid in the OP. I made him value it trivially highly so I didn’t have to include a discussion of those scenarios to show they are suboptimal for Derek, but you can calculate his EV yourself, they will always be less than thr scenarios I described in the OP.
If the tables were turned, and we got rid of the “no negotiation allowed” rule, and Derek was a $0 honesty
Derek’s honesty value doesn’t affect those scenarios. In a negotiation, the turn order, information asymmetry etc determine who wins.
So the moral of the story is that whoever runs FDT wins
PEr the original problem, Derek’s optimal move is identical under CDT and FDT. How much he can get from Will is the only variable which depends on Will’s utility calculations.
This isn’t a tiebreaker, it is what value they ascribe to different scenarios. Since CDT-Will’s posterior calculation is limited to his causal effects, the value that can be extracted from him is much lower.
- Firmament 16 Mar 2026 4:02 UTC
  1 point
  0
  Parent
  The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn’t seem very sensible, imo.
  What I mean is that you can think of “CDT agent with certain utility function” and “FDT agent” as exactly the same. They’re the same concept. So when you say “I don’t think it is approximating FDT. I think it is just different values.” I reply that “different values” and “approximating FDT” are the exact same thing, at least in the case where the mentioned “different values” are “justice, trust, and honor”, in my opinion.
  So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,
  In that case, Derek could demand infinite dollars from Will and Will would pay it.
  Well, Will only values his life at a million dollars, so he would rather die than pay more than that. I admit that when I wrote that bit I was mentally conflating “one million” and “infinity” to simplify reasoning. Hopefully all the other shortcuts I’m using don’t break anything.
  Intuition says the “infinity” here comes from Derek and Will’s infinitely accurate predictions. As in, if the predictions were less than infinitely accurate, then you would need less than infinity dollars of honest-value to make CDT act like FDT. Dunno if that’s true and it doesn’t matter if it does, so, whatever.
  [the rest of the reply]
  I should have clarified more, oops. I was talking about a minor variation of the scenario where the “negotiation is not possible” restriction is lifted (while still keeping the information asymmetry somehow). In this case, with no other changes, the problem is basically the same, since Derek just says “btw I swear on God almighty that I am not negotiating at all, since this way I get the best outcomes” and then the rest of the scenario plays out the same (as long as we posit that Will’s memory of this exchange is magically erased and so FDT-Will doesn’t consider changing his behavior to get a better deal)
  And meanwhile if Derek’s $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will
  Meanwhile if Derek has $0 honesty and is CDT and he says “btw no negotiation” then FDT-Will can say “no, screw you, we’re negotiating or I swear I will bury my head in the sand and die” and Derek will say “oh ok, i can tell that you will keep your promise, nevermind then, let’s negotiate”. FDT-Will then says “give me $0.99 and I’ll let you save my life” and poor CDT-Derek will agree.
  The FDT-Derek + FDT-Will case is probably important but it scares me and I don’t know how to reason about it. Probably with geometric utility. In this case, if we add a rule saying Derek gets 1 million dollars of utility from being alive, FDT-Derek pays 50 cents to FDT-Will to maximize the logarithm of utility, since Derek gets $1,000,000 + $0.50 and Will gets $1,000,000 + $0.50 and since these numbers are the same utility is maximized which is the best possible output for the function (we are ignoring the honesty-utility here)
  - DanielW 16 Mar 2026 16:42 UTC
    1 point
    0
    Parent
    Also, putting this in another post since I think it is a major point, if we assume some cost to bargaining, for Derek it approximates something like a dove-hawk game, where Derek gets the first move. Will’s game is more complex as he is operating under information assymetry, so depends on the odds he assigns some probailities
    If we consider the value Will pays as X (negative if Derek pays Will), if we assume some cost (C) of both negotiating the outcome, the payoffs works out to (I don’t know how/if you can put tables into comments so I just have to write them out):
    Payoffs given FDT-Will with Negotiation:
    (1) Will accepts the initial offer (for FDT-Will, X = 1,000,199.99):
    Derek: 1 + 1,000,199.99
    Will: 0 * −1,000,000 + 1 * −999,999.99
    (2) Will Contests and Derek Accepts (say X = −0.99^[1]):
    Derek: 1 − 0.99
    Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200.99) + P(Derek contests)*( (3) Will)
    (3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C − 1) and X < (1,000,200 - C) is feasible:
    Derek: 1 + X—C
    Will: P(Derek rejects)^[2]*-1,000,0000 + P(Derek accepts)*(200- X) - C
    Counterfactual: Derek doesn’t offer an amount and Will doesn’t contest (X = 0)
    Derek: 1
    Will: 1,000,200
    Payoffs given CDT-Will with Negotiation:
    (1) Will accepts the initial offer (for CDT-Will, X = 199.99, since anything greater wouldn’t be paid):
    Derek: 1 + 199.99
    Will: 0 * −1,000,000 + 1 * 0.01
    (2) Will Contests and Derek Accepts (X = −0.99):
    Derek: 1 − 0.99
    Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200.99) + P(Derek contests)*( (3).Will)
    (3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C − 1) and X < (200 - C) is feasible:
    Derek: 1 + X—C
    Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200- X) - C
    Counterfactual: Derek doesn’t offer an amount and Will doesn’t contest (X = 0)
    Derek: 1
    Will: 1,000,200
    While we would need to know Will’s probability estimates to actually model how they behave and what actions they take, from this it seems rather evident that under most approximations CDT-Will is still likely to be better off.
    ^
    Realistically, Will could set a value of X higher to decrease the chances of Derek contesting. But I am just assuming the extreme case here.
    ^
    These probabilities depend on the values of X. FDT-Will would estimate that as x approaches 1,000,199.99 P(Derek accepts) approaches 1.
  - DanielW 16 Mar 2026 15:32 UTC
    1 point
    0
    Parent
    What I mean is that you can think of “CDT agent with certain utility function” and “FDT agent” as exactly the same. They’re the same concept.
    They are not. A CDT agent is fundamentally doing a different expected value calculation than a FDT agent. This is why they can lead to radically different outcomes.
    I should have clarified more, oops. I was talking about a minor variation of the scenario where the “negotiation is not possible” restriction is lifted (while still keeping the information asymmetry somehow).
    Okay, play out the scenario. He offers to take CDT will back for $199.99, what does will say? Will’s expected values are:
    “Ok”—payoff = $0.01
    “No”—payoff = -$1,000,000 (this is assuming an honest/total no, the other case is under 3)
    “No take me for X amount.”—payoff = ???^[1] (he doesn’t know whether Derek will accept or not, note that this includes the case where X is zero or the cases where X is negative).
    Now the question becomes “how does Will estimate the payoffs for 3?” What is his expectation for Derek to negotiate? Etc. If we assume sufficient risk aversion (which I would argue is the most probable outcome) 1 is still preferable.
    Let’s imagine Derek offers to take FDT will back for $1,000,199.99. Will’s expected values are:
    Be an agent that would say “OK”—payoff = -$999,999.99
    Be an agent that would say “No”—payoff = -$1,000,000
    Be an agent that would say “no take me for X amount”—payoff = ???
    Edit: Be an agent that would say “Ok” but not actually pay Derek—payoff = -$1,000,000 (I realize I forgot to include this one, as in Parfit’s Hitchiker, the agents’ expected outcome if they wouldn’t pay Derek honestly is that they would be left to die)^[2]
    FDT-Will has the same problem as CDT-Will. Though, for FDT-Will, unlike CDT-Will, I would argue under most reasonable assumptions there would be some preferable value for X under 3 that FDT-Will would estimate has a better expected value. Given that, he would try to negotiate a value somewhere between -$1 and $1,000,199.99. Where he would negotiate that value depends on risk aversions and how he estimates the responses from Derek.
    And meanwhile if Derek’s $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will
    I am not convinced this is true. I don’t see why FDT-Derek would behave differently. If we assume information symmetry, then you get the same commitment race with their priors.
    
    FDT-Will then says “give me $0.99 and I’ll let you save my life”
    Why? See above. FDT-Will under your scenario still suffers from information asymmetry. You can argue that the 3 is reasonably the better option for him, but he has no idea what value of X is optimal. We know Derek considers any value >-$0.99 as an expected positive, but Will is operating in an asymmetric environment. He doesn’t know what Derek will decide. It seems reasonable that Will might expect Derek to accept some lower amount, but he is going to have to weigh that against the probability that Derek says “no.” If he has extreme risk aversion, he will still prefer 1 even if he estimates Derek would likely accept a lower price. If he has no risk aversion and Derek cannot counter offer, he will offer whatever he expects Derek to accept.
    and poor CDT-Derek will agree.
    Why? Let’s lay out CDT-Derek’s option.
    Agree—payoff = -$5.99
    Refuse—payoff = -$6.00
    Refuse and tell FDT-Will “I will only take you back for X amount” (where X is greater than −0.99) - payoff = unspecified (but known to Derek)
    It seems likely that CDT-Derek would pick some variant of 3, dependent on what he expects FDT-Will to react with which depends on FDT-Will’s estimation of CDT-Derek. You would expect to get some race with FDT-Will trying to determine Derek’s utility function. Indeed, if we assume perfect information asymmetry, CDT-Derek’s best move is probably to keep saying “I will only take you back for $1,000,199.99” to prevent FDT-Will from getting any information on his utility, if CDT-Derek repeats that until FDT-Will is about to die if he doesn’t make a decision, FDT-Will, having gained no information on the Derek’s utility, is likely to simply accept when he becomes unable to negotiate further (for the same reasons as above).^[3] And, making the standard FDT estimates when he gets to town (i.e., he anticipates if he wasn’t the kind of agent that would pay, he would have been left to die), he would honestly pay the $1,000,199.99.
    ^
    When I use ‘???’ I mean that it is both unspecified under the assumptions of the equations and unknown to the agent. We would need to add additional specifications to the problem to determine the expected payoff for different values of X, and without changing assumptions Will’s expected payoff from X would remain unknown to Will. We could add assumptions for Will’s estimates (which are not likely to be equivalent to the real payoffs), to determine what Will would estimate the expected payoffs are for different values of X.
    ^
    I am not including this option for the CDT agent since it is a strictly inferior version of 1, since their payout for honesty is $200 it is trivial that they would always be honest for under $199.99
    ^
    If there is no such cutoff, they are in a classic battle of the sexes type problem FDT-Will’s expected payoff from a deal is $1,000,200 - X, where Derek’s payoff is $1 + X. It is not clear what FDT-Will’s position would have to be for him to expect CDT-Derek to accept a better deal. Any deal from −0.99 to $1,000,199.99 is feasible under our assumptions (and would be a Nash Equilibrium) but we have no reason to expect any outcome in that range without adding some assumptions.
    - Firmament 20 Mar 2026 4:03 UTC
      1 point
      0
      Parent
      I don’t have the time to give this full consideration, but on the whole I think you are correct if Will has the information asymmetry in both the negotiation phase and the payment phase, whereas I was implicitly assuming Will having full information in negotiation and suddenly gaining an information asymmetry in the payment phase (which doesn’t make much sense). So, yeah, I think I agree.