Firmament comments on DanielW’s Shortform

Firmament 15 Mar 2026 1:27 UTC
1 point
0
Hm, I guess it’s an empirical question then, of whether these situations happen in real life often enough to warrant using CDT or FDT-acting-like-CDT. I think FDT still wins out in the end, because FDT will emulate CDT if it realizes it lives in an FDT-hostile world (thus taking CDT-like actions out of an abundance of caution), while CDT has a harder time emulating FDT (CDT would need to use precommitment, while FDT does not).
I think that, in real life, humans are already near-optimal on this. People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT. Issues occur when a person’s community changes or if they move to a new community, but that problem is perhaps outside of the scope of decision theory.
So the moral of the story is “run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems”
And, of course, the point of LessWrong decision theory is to choose which one to give to a superintelligence (I think), since humans can’t actually choose their own decision theory (for the most part). And superintelligences probably won’t end up on the short side of an information asymmetry very often.
You could construct a problem that unfairly favors CDT instead, like “Newcomb’s Problem but, unbeknownst to you, if you one-box, you die”.
Oops, meant “unfairly favors FDT instead; if you two-box, you die”. It is indeed not very realistic, but it is possible to construct an isomorphic question in a more realistic setting, much like how Parfit’s Hitchhiker is a more realistic yet isomorphic framing of Newcomb’s Problem. So, again, it’s an empirical question of which decision theory is favored in situations that occur in real life.
Claude Opus 4.6 Extended weighs in with what I believe to be insightful and correct critique (oneshot/no selection; told to “focus on logical errors”; I’m “Poster 2″ and you’re “Poster 1”)
There are a few errors and confusions here that neither poster fully identifies:
The honesty parameter is doing most of the work, and it’s doing different work for each DT. Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment—without it, CDT-Will in town reasons “the ride already happened, paying has no causal effect on anything I care about” and pays $0. Under FDT, the honesty value is nearly irrelevant ($200 on top of $1,000,000 from counterfactual reasoning). So the comparison isn’t really “FDT vs CDT”—it’s “FDT’s counterfactual reasoning makes the full life-value extractable vs. CDT needs an exogenous honesty parameter bolted on to make any value extractable.” Poster 1 has quietly smuggled in a mechanism that makes CDT behave cooperatively for reasons that have nothing to do with CDT’s actual decision-theoretic properties, then compared the magnitude of extractable surplus.
If you set honesty to $0, CDT-Will pays nothing and Derek (knowing this) never stops. FDT-Will still pays up to $1M and Derek extracts near-maximal surplus. The comparison reverses in valence: now CDT-Will dies and FDT-Will lives. This is just… standard Parfit’s Hitchhiker again.
The “Will chooses his decision theory” argument is incoherent as a critique of FDT. Poster 1 frames it as a problem that FDT-Will, given the choice, would “choose to be a CDT agent.” But that’s just FDT working as intended. FDT says “implement whatever algorithm leads to best outcomes.” If in this specific information-asymmetric environment the CDT-like algorithm dominates, FDT recommends it. This isn’t FDT losing—it’s FDT being a meta-theory that can recommend CDT-like behavior when appropriate. CDT can’t do the reverse; it can’t say “in Newcomb-like problems, implement FDT-like behavior” because CDT lacks the counterfactual reasoning to even identify when that’s beneficial.
Poster 2 gestures at this (“FDT can emulate CDT”) but doesn’t make the sharper point that this “emulation” is just FDT functioning correctly at the meta-level.
The information asymmetry is contrived in a way that’s load-bearing but presented as “realistic.” Derek has perfect knowledge of Will’s utility function, decision theory, and honesty parameter, while Will has zero strategic information about Derek’s pricing. Poster 1 claims this is more realistic than Newcomb, but a scenario requiring one-sided perfect utility-function-level mind-reading is not obviously closer to real-world bargaining. In actual markets both parties have partial information and the dynamics look very different. The “Amazon knows more about you” analogy is a much weaker asymmetry than “Amazon knows your exact utility function and decision theory with arbitrarily high confidence.”
An underexplored FDT subtlety with partial information. Under FDT, Will reasons about what happens when copies/simulations of his algorithm are evaluated by other agents. Poster 1 claims Will can’t do this because he doesn’t know Derek is modeling him—but this is confused about how FDT works. FDT-Will doesn’t need to know the specific modeling; he reasons “I am implementing algorithm A. In all environments where A is evaluated—including by potential predictors—what are its outputs?” The question of whether FDT requires knowledge that you’re being modeled to apply counterfactual reasoning is a genuinely deep question about FDT’s foundations, and it’s where the real philosophical action is. Neither poster engages with it.
Basically: poster 1 has constructed a scenario where the honesty parameter does the heavy lifting for CDT and the information asymmetry does the heavy lifting against FDT, then concluded that CDT “wins”—but the lesson is closer to “bargaining power determines surplus distribution, and agents who can credibly commit get better deals in symmetric info but are more exploitable under asymmetric info.” Which is true but is a general feature of commitment, not a special failure of FDT.
- DanielW 15 Mar 2026 3:09 UTC
  2 points
  0
  Parent
  Hm, I guess it’s an empirical question then
  I sort of agree, but I don’t think it is one we can strictly answer. I gave some reasons we might think (in real world scenarios) that CDT tends to better explain behavior (e.g., Braess’s paradox), though I do not believe it is one we can have enough data to answer for all time.
  People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT.
  I disagree. I think the difference in high and low trust societies can largely be attributed to differences in utility functions and signaling effects. This appears empirically true (high-trust societies tend to have more information on each others activities, making signaling more meaningful, tend to poll as valuing trust, etc).
  Also, the problem isn’t “FDT-hostile” per-say. Derek is just trying to maximize his utility, he only cares what value he thinks he can get Will to honestly pay him. FDT does worse because it recommends taking the deal in Parfit’s hitchhiker and doesn’t have a posterior restraint on honest signaling.
  If Derek was a misanthrope, Will could lose under CDT. If Derek valued saving Will’s life at −195 (instead of +6), Derek would leave CDT Will to die and still save FDT Will. This is why I predicated that Derek has to be a decent person, he prefers a scenario where everyone wins to one where he wins and Will dies.
  Parfit’s Hitchhiker is a more realistic yet isomorphic framing of Newcomb’s Problem
  Minor nitpick but Parfit’s hitchiker is isomorphic under CDT and to Newcomb’s Problem with transparent boxes under both CDT and EDT but not Newcomb’s problem in general. CDT doesn’t care about evidentiary probability, but if you don’t know what is in the boxes EDT says you should act probabilistically.
  So the moral of the story is “run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems”
  I mean, I guess that is not unfair, but it seems bad. It would imply honest signals are good for CDT agents and dishonest signals are good for FDT agents.
  
  Also, Claude is wrong or missing the point.
  Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment
  Yes, this is the point. In the real world, for some value people are unwilling to cheat and for some value people are willing to cheat. The number of people, on seeing someone drop a $5 bill, that would return it to the person is greater than the people who seeing them drop a $100 bill would return it. This is a pretty trivial observation. Honesty in the real world is dependent on external signaling effects (i.e. people know others will be more honest with them if they are seen to be honest with others) and individual values (i.e., people are more honest because they ascribe some nebulous value to being honest, people who ascribe a greater value to honesty will be more honest than those who ascribe less of a value).
  If you set honesty to $0, CDT-Will pays nothing
  Derek still saves them, if you set all externalities to nothing, then yes. But the point of the hypothetical is to assume more normal human values. Most humans don’t like killing people and don’t like being dishonest.
  FDT-Will still pays up to $1M and Derek extracts near-maximal surplus.
  Which seems suboptimal. Extracting all of the surplus value for little effort doesn’t seem like a good thing. In society we ideally want to have parties be able to negotiate how to best divide the surplus according to various principles and social values. FDT-Will can be taken advantage of because he doesn’t have realistic human values. Similarly, in the typical formulation the driver can leave the hitchiker to die and the hitchhiker will be dishonest at any dollar figure because they have unrealistic utility functions.
  The information asymmetry is contrived in a way that’s load-bearing but presented as “realistic.”
  I mean, less so than in the original or in the inverse from Will’s perspective. I explained why I would argue it is generally realistic. It is an extreme case. In truly realistic scenarios, I would expect Derek to simply extract more from FDT-Will to a differing degree depending on how confident he was Will would pay him back. If Derek thought Will was CDT will, he would have to base that response on the estimated utility of Will paying him back when saved, which would be the $200. If he expected Will was FDT will, he would have to do so based on his estimate of Will’s functional utility for the total scenario, which would be $1,000,200. Realistically, it would be under $1,00,000 since he would expect Will to model some cut off to get a better deal. That cut off would fall between $0 and $1,000,200 depending on his relative estimates. If there was less asymmetry, it may end up that he would estimate he could only be reasonably confident that Will would take $5,000. But that would still have FDT-Will worse off since his bargaining position assumes a much larger stake than CDT-Will.
  As I said in my OP, my comment was to make more realistic but still keep most of the simplifying assumptions. You can add more variables to make it more realistic—the more you add the more complex it becomes to model.
  Poster 1 claims Will can’t do this because he doesn’t know Derek is modeling him
  That wasn’t what I was claiming. Did you have the same confusion? The problem Will has is he modelling to pay Derek or not based on the assumption that if he doesn’t pay Derek he would have been not saved (i.e., exactly how FDT models Will’s decision under the traditional Parfit’s dilemma).
  This necessitates he has some understanding of Derek’s decision making, if he had no reason to think Derek cared either way, he would be fine lying in both the original Parfit’s Dilemma and my ‘Decent Driver’ version.
  The question of whether FDT requires knowledge that you’re being modeled to apply counterfactual reasoning is a genuinely deep question about FDT’s foundations, and it’s where the real philosophical action is.
  No it isn’t, FDT plainly doesn’t require that. FDT gives different outputs if you are being modeled in so far as how the agent is being modeled affects how the agent assesses the values at issue.
  - Firmament 15 Mar 2026 5:22 UTC
    2 points
    0
    Parent
    I gave some reasons we might think (in real world scenarios) that CDT tends to better explain behavior (e.g., Braess’s paradox), though I do not believe it is one we can have enough data to answer for all time.
    Indeed, whatever humans do seems to be closer to CDT than other decision theories, although humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
    People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT.
    I disagree. I think the difference in high and low trust societies can largely be attributed to differences in utility functions and signaling effects. This appears empirically true (high-trust societies tend to have more information on each others activities, making signaling more meaningful, tend to poll as valuing trust, etc).
    The differences in utility functions are the humans’ way of implementing FDT (since FDT is too hard to reason about for evolution to instill it directly), and the signaling effects/mutual knowledge are what makes FDT worth it.
    Really, in this scenario, the fact that Will values honesty and promise-keeping means that CDT-Will is implementing a decision theory somewhere between CDT and FDT. FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200. Your argument seems to be that, due to uncertainties in real life, the optimal value to place on honesty is somewhere between zero and infinity, but not at either extreme. Which is true.
    Also, the problem isn’t “FDT-hostile” per-say. Derek is just trying to maximize his utility, he only cares what value he thinks he can get Will to honestly pay him. FDT does worse because it recommends taking the deal in Parfit’s hitchhiker and doesn’t have a posterior restraint on honest signaling.
    If Derek was a misanthrope, Will could lose under CDT. If Derek valued saving Will’s life at −195 (instead of +6), Derek would leave CDT Will to die and still save FDT Will. This is why I predicated that Derek has to be a decent person, he prefers a scenario where everyone wins to one where he wins and Will dies.
    I guess FDT-hostile is too strong a way to put it, since it implies the problem is an unfair problem. But as @papetoast said, there are some problems that FDT does better on, and some that CDT does better on (like the variant where Derek is misanthropic), and this one is one that CDT does better on.
    Parfit’s Hitchhiker is a more realistic yet isomorphic framing of Newcomb’s Problem
    Minor nitpick but Parfit’s hitchiker is isomorphic under CDT and to Newcomb’s Problem with transparent boxes under both CDT and EDT but not Newcomb’s problem in general. CDT doesn’t care about evidentiary probability, but if you don’t know what is in the boxes EDT says you should act probabilistically.
    Oops, I wasn’t aware that there was a distinction between the transparent-box version and the opaque-box version. Thank you for the correction.
    So the moral of the story is “run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems”
    I mean, I guess that is not unfair, but it seems bad. It would imply honest signals are good for CDT agents and dishonest signals are good for FDT agents.
    I am confused as to what you mean. CDT and FDT-emulating-CDT act the same, so they’re equally honest and get equally as much benefit from honesty. Is this about this specific problem? Or all similar problems? But this doesn’t seem to be load-bearing, so, whatever.
    Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment
    Yes, this is the point. In the real world, for some value people are unwilling to cheat and for some value people are willing to cheat. The number of people, on seeing someone drop a $5 bill, that would return it to the person is greater than the people who seeing them drop a $100 bill would return it. This is a pretty trivial observation. Honesty in the real world is dependent on external signaling effects (i.e. people know others will be more honest with them if they are seen to be honest with others) and individual values (i.e., people are more honest because they ascribe some nebulous value to being honest, people who ascribe a greater value to honesty will be more honest than those who ascribe less of a value).
    True, the $200 honesty-value seems to be there just to make CDT act more-FDT-like.
    If you set honesty to $0, CDT-Will pays nothing
    Derek still saves them
    Dang, you’re right, I really should have noticed that.
    But the point of the hypothetical is to assume more normal human values. Most humans don’t like killing people and don’t like being dishonest.
    Wait, does the $200 honesty-value actually matter here? It doesn’t seem like it changes the results of the hypothetical if you remove it, and removing it would make it easier to reason about.
    FDT-Will still pays up to $1M and Derek extracts near-maximal surplus.
    Which seems suboptimal. Extracting all of the surplus value for little effort doesn’t seem like a good thing. In society we ideally want to have parties be able to negotiate how to best divide the surplus according to various principles and social values. FDT-Will can be taken advantage of because he doesn’t have realistic human values. Similarly, in the typical formulation the driver can leave the hitchiker to die and the hitchhiker will be dishonest at any dollar figure because they have unrealistic utility functions.
    It seems that if Derek didn’t value honesty so highly that he wouldn’t stick to his first offer and they would be able to come to a fairer deal. But this would be bad for Derek.
    If tremendously valuing honesty is equivalent to FDT in this scenario (which it roughly seems to be but only because everyone makes a bunch of promises at the start in the desert), then in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic). So since Derek ends up with all the utility here, I guess FDT is good if you’re Derek and only bad if you’re Will. But I haven’t thought about this enough.
    As I said in my OP, my comment was to make more realistic but still keep most of the simplifying assumptions. You can add more variables to make it more realistic—the more you add the more complex it becomes to model.
    I was focused on the low-level claims so I neglected to paste that top-level comment into Claude’s chat; oops. Anyways, I don’t see the value in making the scenario complex. If the goal is to show a flaw in FDT, then that flaw will manifest in a simple scenario, which would be easier to reason about. But I guess if the goal is to show what should be done by a real human pragmatically, then complexity might be fine.
    Poster 1 claims Will can’t do this because he doesn’t know Derek is modeling him
    That wasn’t what I was claiming. Did you have the same confusion? The problem Will has is he modelling to pay Derek or not based on the assumption that if he doesn’t pay Derek he would have been not saved (i.e., exactly how FDT models Will’s decision under the traditional Parfit’s dilemma).
    Well, Derek is modeling Will on two levels. Derek is modeling what prices out of all possible prices Will would pay at, and Derek is modeling whether Will will pay the price that Derek actually decides. Will is only aware of the latter level, but isn’t aware of the price-setting that Derek was doing before. So Will can’t effectively leverage FDT, since he isn’t aware of that first level of modeling.
    The question of whether FDT requires knowledge that you’re being modeled to apply counterfactual reasoning is a genuinely deep question about FDT’s foundations, and it’s where the real philosophical action is.
    No it isn’t, FDT plainly doesn’t require that. FDT gives different outputs if you are being modeled in so far as how the agent is being modeled affects how the agent assesses the values at issue.
    I’m having trouble comprehending this and should probably get some sleep, but it would seem Claude is being weird and overconfident here so I hereby downgrade my overconfident endorsement of Claude’s outputs from “insightful and correct” to “looks right in some places but makes mistakes or is overconfident in other places”.
    - DanielW 15 Mar 2026 6:15 UTC
      2 points
      0
      Parent
      humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
      I don’t think it is approximating FDT. I think it is just different values. Laws and policies may make CDT agents approximate what FDT agents would do without those laws, but that is not what I mean. Real humans have complex sets of desires/utility functions.
      The differences in utility functions are the humans’ way of implementing FDT
      ...
      FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200.
      ...
      I think this may be a central confusion. You are misunderstanding the hypothetical somewhat. FDT-Will values honesty at $200. He and CDT-Will would both be willing to be dishonest if it got him a >$200 payoff. To take my prior example, if someone dropped a $100 bill, he would return it. But if they dropped >$200, he would pocket it. The reason he is willing to pay up to the value of his life +$200 is because his assessment of the value is not based on how he values honesty, it is only based on how he expect agents like himself to be treated.
      True, the $200 honesty-value seems to be there just to make CDT act more-FDT-like.
      ...
      Wait, does the $200 honesty-value actually matter here?
      ...
      Anyways, I don’t see the value in making the scenario complex.
      Out of order, but i think it is more relevant here. Absent the $200, Will is monetarily better off (since Derek would drive them back anyway). It is to show in real scenarios what factors the other party might use to determine how much to demand of the hitchhiker. In the original, why the driver asks for what he does is ignored. Realistically, people don’t set prices at random.
      The value of the $200 is meant to show price setting behavior in a more realistic CDT environment. It is not relevant to CDT winning. CDT wins in the given because Derek is a Decent Driver (hence the name). If Derek wasn’t a decent guy, CDT would still win if (and only if) they valued the signaling + honesty greater than Derek thought driving back was costly. Derek loses every dollar he values honesty more than Derek sees saving him as costly (though it doesn’t affect his actual expected value payoff, just amount that is actual cash money).
      But I agree on the complexity. I guess it wpuld have been better to first present Will as a simply agent with no external values and then show how he would behave under CDT with more realistic values. But the more realistic values are what I’d argue are more relevant for where CDT offers different policy implications.
      Oops, I wasn’t aware that there was a distinction between the transparent-box version and the opaque-box version. Thank you for the correction.
      No worries, the reason the original was interesting is that CDT estimates two-boxing maximizes expected value while EDT would estimate one boxing does. Both EDT and CDT in the transparent case would say two box. EDT says in the opaque case if you one box there is a 99% chance (or whatever probability you apply) of the opaque box having the money, so one-boxing works out a higher value. But if you can see what’s in the box, that is no longer a evidential problem so EDT says to two box and you get no difference.
      I am confused as to what you mean. CDT and FDT-emulating-CDT act the same, so they’re equally honest and get equally as much benefit from honesty. Is this about this specific problem? Or all similar problems? But this doesn’t seem to be load-bearing, so, whatever.
      I will have to think on this, but my first thought is that my previous point on honesty applies. In this scenerio, the CDT agent gets a better deal by signaling honestly they will act as if they value being honest about their payments at $200. The FDT agent can actually do better (as you said in your first response) if there isn’t asymmetry by acting as if they would never repay a payment. This implies to me a dishonest signal. But yea, it isn’t load bearing, it is somewhat my own intuition.
      in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic).
      Real life negotiation often is impossible or extremely costly. Negotiating prices after a hospital visit can cost many man hours. Most retailers won’t allow you to negotiate at all. Setting prices and comitting to them is a pretty conventional tactic. But if you allow negotiation Derek is still likely to get an outsized payment for something he would have done for free. Paying him seems to have some social utility (we want to encourage people to be decent and help others), but limiting it to amounts consisered feasible by social values and honest signals (as under CDT), seems likely to lead to better outcomes than msking the restraints equivilant to the total value of the interaction (which is the restraint under FDT).
      I’m having trouble comprehending this and should probably get some sleep, but it would seem Claude is being weird and overconfident here so I hereby downgrade my overconfident endorsement of Claude’s outputs from “insightful and correct” to “looks right in some places but makes mistakes or is overconfident in other places”.
      Fair enough, same here. Have a good night!
      - Firmament 15 Mar 2026 21:11 UTC
        2 points
        0
        Parent
        humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
        I don’t think it is approximating FDT. I think it is just different values. Laws and policies may make CDT agents approximate what FDT agents would do without those laws, but that is not what I mean. Real humans have complex sets of desires/utility functions.
        I mean, my intuition says that the right utility function can turn any CDT agent into an FDT agent, and any FDT agent can be described in terms of a CDT agent with a certain utility function. Like, CDT will one-box if it intrinsically values one-boxing in Newcomblike problems. So, a human with weird desires for justice and a human running FDT act the same if you squint.
        
        The differences in utility functions are the humans’ way of implementing FDT
        ...
        FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200.
        ...
        I think this may be a central confusion. You are misunderstanding the hypothetical somewhat. FDT-Will values honesty at $200. He and CDT-Will would both be willing to be dishonest if it got him a >$200 payoff. To take my prior example, if someone dropped a $100 bill, he would return it. But if they dropped >$200, he would pocket it. The reason he is willing to pay up to the value of his life +$200 is because his assessment of the value is not based on how he values honesty, it is only based on how he expect agents like himself to be treated.
        I’m making a specific claim about this specific scenario. I agree that both CDT-Will and FDT-Will
        will pick up $300 on the ground and keep it. But in this scenario, back in the desert, all parties
        involved hash out exactly what they’re going to do in the future. So if both Derek and Will were
        running CDT, but they were to value honesty at infinity dollars, then they would act exactly the
        same as if they both ran FDT but valued honesty at zero dollars. So the honesty parameter acts
        as a way to interpolate between CDT and FDT.
        in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic).
        Real life negotiation often is impossible or extremely costly. Negotiating prices after a hospital visit can cost many man hours. Most retailers won’t allow you to negotiate at all. Setting prices and comitting to them is a pretty conventional tactic. But if you allow negotiation Derek is still likely to get an outsized payment for something he would have done for free. Paying him seems to have some social utility (we want to encourage people to be decent and help others), but limiting it to amounts consisered feasible by social values and honest signals (as under CDT), seems likely to lead to better outcomes than msking the restraints equivilant to the total value of the interaction (which is the restraint under FDT).
        This seems like the crux of the issue. I think that the whole reason Will is in this mess is because Derek places a value of $1,000,000 on trust for some reason, making him act exactly like an FDT agent. If the tables were turned, and we got rid of the “no negotiation allowed” rule, and Derek was a $0 honesty CDT agent and Will was an FDT agent (or alternatively a CDT agent with a high value on honesty), then Will could say “I precommit to not letting you drive me to to town unless you pay me $0.99 right now” (we assume Derek has money on him or that Will is capable of somehow paying Derek a negative amount in town) and then Derek would have no choice but to comply. And if instead both agents ran dishonest CDT, then words mean nothing and Derek would silently drive Will to town for the $1 altruism utility. So the moral of the story is that whoever runs FDT wins, with ties broken by whoever has the information advantage. The magnitude of “winning” is very different, because FDT-Derek-FDT-Will ends up netting Derek a million dollars while FDT-Derek-CDT-Will only gets Derek $199,99, but FDT-Derek wins nevertheless.
        
        DanielW 16 Mar 2026 1:42 UTC
        2 points
        0
        Parent
        I mean, my intuition says that the right utility function can turn any CDT agent into an FDT agent, and any FDT agent can be described in terms of a CDT agent with a certain utility function.
        I don’t think that follows. The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn’t seem very sensible, imo.
        So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,
        In that case, Derek could demand infinite dollars from Will and Will would pay it.
        This seems like the crux of the issue. I think that the whole reason Will is in this mess is because Derek places a value of $1,000,000 on trust for some reason, making him act exactly like an FDT agent
        If you remove Derek valuing honesty, his optimal decisions work out identically, as I daid in the OP. I made him value it trivially highly so I didn’t have to include a discussion of those scenarios to show they are suboptimal for Derek, but you can calculate his EV yourself, they will always be less than thr scenarios I described in the OP.
        If the tables were turned, and we got rid of the “no negotiation allowed” rule, and Derek was a $0 honesty
        Derek’s honesty value doesn’t affect those scenarios. In a negotiation, the turn order, information asymmetry etc determine who wins.
        So the moral of the story is that whoever runs FDT wins
        PEr the original problem, Derek’s optimal move is identical under CDT and FDT. How much he can get from Will is the only variable which depends on Will’s utility calculations.
        This isn’t a tiebreaker, it is what value they ascribe to different scenarios. Since CDT-Will’s posterior calculation is limited to his causal effects, the value that can be extracted from him is much lower.
        Firmament 16 Mar 2026 4:02 UTC
        1 point
        0
        Parent
        The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn’t seem very sensible, imo.
        What I mean is that you can think of “CDT agent with certain utility function” and “FDT agent” as exactly the same. They’re the same concept. So when you say “I don’t think it is approximating FDT. I think it is just different values.” I reply that “different values” and “approximating FDT” are the exact same thing, at least in the case where the mentioned “different values” are “justice, trust, and honor”, in my opinion.
        So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,
        In that case, Derek could demand infinite dollars from Will and Will would pay it.
        Well, Will only values his life at a million dollars, so he would rather die than pay more than that. I admit that when I wrote that bit I was mentally conflating “one million” and “infinity” to simplify reasoning. Hopefully all the other shortcuts I’m using don’t break anything.
        Intuition says the “infinity” here comes from Derek and Will’s infinitely accurate predictions. As in, if the predictions were less than infinitely accurate, then you would need less than infinity dollars of honest-value to make CDT act like FDT. Dunno if that’s true and it doesn’t matter if it does, so, whatever.
        [the rest of the reply]
        I should have clarified more, oops. I was talking about a minor variation of the scenario where the “negotiation is not possible” restriction is lifted (while still keeping the information asymmetry somehow). In this case, with no other changes, the problem is basically the same, since Derek just says “btw I swear on God almighty that I am not negotiating at all, since this way I get the best outcomes” and then the rest of the scenario plays out the same (as long as we posit that Will’s memory of this exchange is magically erased and so FDT-Will doesn’t consider changing his behavior to get a better deal)
        And meanwhile if Derek’s $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will
        Meanwhile if Derek has $0 honesty and is CDT and he says “btw no negotiation” then FDT-Will can say “no, screw you, we’re negotiating or I swear I will bury my head in the sand and die” and Derek will say “oh ok, i can tell that you will keep your promise, nevermind then, let’s negotiate”. FDT-Will then says “give me $0.99 and I’ll let you save my life” and poor CDT-Derek will agree.
        The FDT-Derek + FDT-Will case is probably important but it scares me and I don’t know how to reason about it. Probably with geometric utility. In this case, if we add a rule saying Derek gets 1 million dollars of utility from being alive, FDT-Derek pays 50 cents to FDT-Will to maximize the logarithm of utility, since Derek gets $1,000,000 + $0.50 and Will gets $1,000,000 + $0.50 and since these numbers are the same utility is maximized which is the best possible output for the function (we are ignoring the honesty-utility here)
        DanielW 16 Mar 2026 16:42 UTC
        1 point
        0
        Parent
        Also, putting this in another post since I think it is a major point, if we assume some cost to bargaining, for Derek it approximates something like a dove-hawk game, where Derek gets the first move. Will’s game is more complex as he is operating under information assymetry, so depends on the odds he assigns some probailities
        If we consider the value Will pays as X (negative if Derek pays Will), if we assume some cost (C) of both negotiating the outcome, the payoffs works out to (I don’t know how/if you can put tables into comments so I just have to write them out):
        Payoffs given FDT-Will with Negotiation:
        (1) Will accepts the initial offer (for FDT-Will, X = 1,000,199.99):
        Derek: 1 + 1,000,199.99
        Will: 0 * −1,000,000 + 1 * −999,999.99
        (2) Will Contests and Derek Accepts (say X = −0.99^[1]):
        Derek: 1 − 0.99
        Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200.99) + P(Derek contests)*( (3) Will)
        (3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C − 1) and X < (1,000,200 - C) is feasible:
        Derek: 1 + X—C
        Will: P(Derek rejects)^[2]*-1,000,0000 + P(Derek accepts)*(200- X) - C
        Counterfactual: Derek doesn’t offer an amount and Will doesn’t contest (X = 0)
        Derek: 1
        Will: 1,000,200
        Payoffs given CDT-Will with Negotiation:
        (1) Will accepts the initial offer (for CDT-Will, X = 199.99, since anything greater wouldn’t be paid):
        Derek: 1 + 199.99
        Will: 0 * −1,000,000 + 1 * 0.01
        (2) Will Contests and Derek Accepts (X = −0.99):
        Derek: 1 − 0.99
        Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200.99) + P(Derek contests)*( (3).Will)
        (3) Will and Derek contest over X. X is unspecified under the assumptions, any number where X > (C − 1) and X < (200 - C) is feasible:
        Derek: 1 + X—C
        Will: P(Derek rejects)*-1,000,0000 + P(Derek accepts)*(200- X) - C
        Counterfactual: Derek doesn’t offer an amount and Will doesn’t contest (X = 0)
        Derek: 1
        Will: 1,000,200
        While we would need to know Will’s probability estimates to actually model how they behave and what actions they take, from this it seems rather evident that under most approximations CDT-Will is still likely to be better off.
        ^
        Realistically, Will could set a value of X higher to decrease the chances of Derek contesting. But I am just assuming the extreme case here.
        ^
        These probabilities depend on the values of X. FDT-Will would estimate that as x approaches 1,000,199.99 P(Derek accepts) approaches 1.
        DanielW 16 Mar 2026 15:32 UTC
        1 point
        0
        Parent
        What I mean is that you can think of “CDT agent with certain utility function” and “FDT agent” as exactly the same. They’re the same concept.
        They are not. A CDT agent is fundamentally doing a different expected value calculation than a FDT agent. This is why they can lead to radically different outcomes.
        I should have clarified more, oops. I was talking about a minor variation of the scenario where the “negotiation is not possible” restriction is lifted (while still keeping the information asymmetry somehow).
        Okay, play out the scenario. He offers to take CDT will back for $199.99, what does will say? Will’s expected values are:
        “Ok”—payoff = $0.01
        “No”—payoff = -$1,000,000 (this is assuming an honest/total no, the other case is under 3)
        “No take me for X amount.”—payoff = ???^[1] (he doesn’t know whether Derek will accept or not, note that this includes the case where X is zero or the cases where X is negative).
        Now the question becomes “how does Will estimate the payoffs for 3?” What is his expectation for Derek to negotiate? Etc. If we assume sufficient risk aversion (which I would argue is the most probable outcome) 1 is still preferable.
        Let’s imagine Derek offers to take FDT will back for $1,000,199.99. Will’s expected values are:
        Be an agent that would say “OK”—payoff = -$999,999.99
        Be an agent that would say “No”—payoff = -$1,000,000
        Be an agent that would say “no take me for X amount”—payoff = ???
        Edit: Be an agent that would say “Ok” but not actually pay Derek—payoff = -$1,000,000 (I realize I forgot to include this one, as in Parfit’s Hitchiker, the agents’ expected outcome if they wouldn’t pay Derek honestly is that they would be left to die)^[2]
        FDT-Will has the same problem as CDT-Will. Though, for FDT-Will, unlike CDT-Will, I would argue under most reasonable assumptions there would be some preferable value for X under 3 that FDT-Will would estimate has a better expected value. Given that, he would try to negotiate a value somewhere between -$1 and $1,000,199.99. Where he would negotiate that value depends on risk aversions and how he estimates the responses from Derek.
        And meanwhile if Derek’s $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will
        I am not convinced this is true. I don’t see why FDT-Derek would behave differently. If we assume information symmetry, then you get the same commitment race with their priors.
        
        FDT-Will then says “give me $0.99 and I’ll let you save my life”
        Why? See above. FDT-Will under your scenario still suffers from information asymmetry. You can argue that the 3 is reasonably the better option for him, but he has no idea what value of X is optimal. We know Derek considers any value >-$0.99 as an expected positive, but Will is operating in an asymmetric environment. He doesn’t know what Derek will decide. It seems reasonable that Will might expect Derek to accept some lower amount, but he is going to have to weigh that against the probability that Derek says “no.” If he has extreme risk aversion, he will still prefer 1 even if he estimates Derek would likely accept a lower price. If he has no risk aversion and Derek cannot counter offer, he will offer whatever he expects Derek to accept.
        and poor CDT-Derek will agree.
        Why? Let’s lay out CDT-Derek’s option.
        Agree—payoff = -$5.99
        Refuse—payoff = -$6.00
        Refuse and tell FDT-Will “I will only take you back for X amount” (where X is greater than −0.99) - payoff = unspecified (but known to Derek)
        It seems likely that CDT-Derek would pick some variant of 3, dependent on what he expects FDT-Will to react with which depends on FDT-Will’s estimation of CDT-Derek. You would expect to get some race with FDT-Will trying to determine Derek’s utility function. Indeed, if we assume perfect information asymmetry, CDT-Derek’s best move is probably to keep saying “I will only take you back for $1,000,199.99” to prevent FDT-Will from getting any information on his utility, if CDT-Derek repeats that until FDT-Will is about to die if he doesn’t make a decision, FDT-Will, having gained no information on the Derek’s utility, is likely to simply accept when he becomes unable to negotiate further (for the same reasons as above).^[3] And, making the standard FDT estimates when he gets to town (i.e., he anticipates if he wasn’t the kind of agent that would pay, he would have been left to die), he would honestly pay the $1,000,199.99.
        ^
        When I use ‘???’ I mean that it is both unspecified under the assumptions of the equations and unknown to the agent. We would need to add additional specifications to the problem to determine the expected payoff for different values of X, and without changing assumptions Will’s expected payoff from X would remain unknown to Will. We could add assumptions for Will’s estimates (which are not likely to be equivalent to the real payoffs), to determine what Will would estimate the expected payoffs are for different values of X.
        ^
        I am not including this option for the CDT agent since it is a strictly inferior version of 1, since their payout for honesty is $200 it is trivial that they would always be honest for under $199.99
        ^
        If there is no such cutoff, they are in a classic battle of the sexes type problem FDT-Will’s expected payoff from a deal is $1,000,200 - X, where Derek’s payoff is $1 + X. It is not clear what FDT-Will’s position would have to be for him to expect CDT-Derek to accept a better deal. Any deal from −0.99 to $1,000,199.99 is feasible under our assumptions (and would be a Nash Equilibrium) but we have no reason to expect any outcome in that range without adding some assumptions.
        Firmament 20 Mar 2026 4:03 UTC
        1 point
        0
        Parent
        I don’t have the time to give this full consideration, but on the whole I think you are correct if Will has the information asymmetry in both the negotiation phase and the payment phase, whereas I was implicitly assuming Will having full information in negotiation and suddenly gaining an information asymmetry in the payment phase (which doesn’t make much sense). So, yeah, I think I agree.