Firmament

Karma: 8

Firmament 20 Mar 2026 4:03 UTC
1 point
0
in reply to: DanielW’s comment on: DanielW’s Shortform
I don’t have the time to give this full consideration, but on the whole I think you are correct if Will has the information asymmetry in both the negotiation phase and the payment phase, whereas I was implicitly assuming Will having full information in negotiation and suddenly gaining an information asymmetry in the payment phase (which doesn’t make much sense). So, yeah, I think I agree.

Firmament 16 Mar 2026 4:02 UTC
1 point
0
in reply to: DanielW’s comment on: DanielW’s Shortform
The CDT and FDT agents have the same utility functions and behave differently. Of course, if you gave them different tailored utility functions you could get them to behave the same in any given case, but that doesn’t seem very sensible, imo.
What I mean is that you can think of “CDT agent with certain utility function” and “FDT agent” as exactly the same. They’re the same concept. So when you say “I don’t think it is approximating FDT. I think it is just different values.” I reply that “different values” and “approximating FDT” are the exact same thing, at least in the case where the mentioned “different values” are “justice, trust, and honor”, in my opinion.
So if both Derek and Will were running CDT, but they were to value honesty at infinity dollars,
In that case, Derek could demand infinite dollars from Will and Will would pay it.
Well, Will only values his life at a million dollars, so he would rather die than pay more than that. I admit that when I wrote that bit I was mentally conflating “one million” and “infinity” to simplify reasoning. Hopefully all the other shortcuts I’m using don’t break anything.
Intuition says the “infinity” here comes from Derek and Will’s infinitely accurate predictions. As in, if the predictions were less than infinitely accurate, then you would need less than infinity dollars of honest-value to make CDT act like FDT. Dunno if that’s true and it doesn’t matter if it does, so, whatever.
[the rest of the reply]
I should have clarified more, oops. I was talking about a minor variation of the scenario where the “negotiation is not possible” restriction is lifted (while still keeping the information asymmetry somehow). In this case, with no other changes, the problem is basically the same, since Derek just says “btw I swear on God almighty that I am not negotiating at all, since this way I get the best outcomes” and then the rest of the scenario plays out the same (as long as we posit that Will’s memory of this exchange is magically erased and so FDT-Will doesn’t consider changing his behavior to get a better deal)
And meanwhile if Derek’s $1,000,000 value on honesty is set to $0 BUT he uses FDT then the exact same thing happens absent any weird commitment-race dynamics with FDT-Will
Meanwhile if Derek has $0 honesty and is CDT and he says “btw no negotiation” then FDT-Will can say “no, screw you, we’re negotiating or I swear I will bury my head in the sand and die” and Derek will say “oh ok, i can tell that you will keep your promise, nevermind then, let’s negotiate”. FDT-Will then says “give me $0.99 and I’ll let you save my life” and poor CDT-Derek will agree.
The FDT-Derek + FDT-Will case is probably important but it scares me and I don’t know how to reason about it. Probably with geometric utility. In this case, if we add a rule saying Derek gets 1 million dollars of utility from being alive, FDT-Derek pays 50 cents to FDT-Will to maximize the logarithm of utility, since Derek gets $1,000,000 + $0.50 and Will gets $1,000,000 + $0.50 and since these numbers are the same utility is maximized which is the best possible output for the function (we are ignoring the honesty-utility here)

Firmament 15 Mar 2026 21:11 UTC
2 points
0
in reply to: DanielW’s comment on: DanielW’s Shortform
humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
I don’t think it is approximating FDT. I think it is just different values. Laws and policies may make CDT agents approximate what FDT agents would do without those laws, but that is not what I mean. Real humans have complex sets of desires/utility functions.
I mean, my intuition says that the right utility function can turn any CDT agent into an FDT agent, and any FDT agent can be described in terms of a CDT agent with a certain utility function. Like, CDT will one-box if it intrinsically values one-boxing in Newcomblike problems. So, a human with weird desires for justice and a human running FDT act the same if you squint.

The differences in utility functions are the humans’ way of implementing FDT
...
FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200.
...
I think this may be a central confusion. You are misunderstanding the hypothetical somewhat. FDT-Will values honesty at $200. He and CDT-Will would both be willing to be dishonest if it got him a >$200 payoff. To take my prior example, if someone dropped a $100 bill, he would return it. But if they dropped >$200, he would pocket it. The reason he is willing to pay up to the value of his life +$200 is because his assessment of the value is not based on how he values honesty, it is only based on how he expect agents like himself to be treated.
I’m making a specific claim about this specific scenario. I agree that both CDT-Will and FDT-Will
will pick up $300 on the ground and keep it. But in this scenario, back in the desert, all parties
involved hash out exactly what they’re going to do in the future. So if both Derek and Will were
running CDT, but they were to value honesty at infinity dollars, then they would act exactly the
same as if they both ran FDT but valued honesty at zero dollars. So the honesty parameter acts
as a way to interpolate between CDT and FDT.
in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic).
Real life negotiation often is impossible or extremely costly. Negotiating prices after a hospital visit can cost many man hours. Most retailers won’t allow you to negotiate at all. Setting prices and comitting to them is a pretty conventional tactic. But if you allow negotiation Derek is still likely to get an outsized payment for something he would have done for free. Paying him seems to have some social utility (we want to encourage people to be decent and help others), but limiting it to amounts consisered feasible by social values and honest signals (as under CDT), seems likely to lead to better outcomes than msking the restraints equivilant to the total value of the interaction (which is the restraint under FDT).
This seems like the crux of the issue. I think that the whole reason Will is in this mess is because Derek places a value of $1,000,000 on trust for some reason, making him act exactly like an FDT agent. If the tables were turned, and we got rid of the “no negotiation allowed” rule, and Derek was a $0 honesty CDT agent and Will was an FDT agent (or alternatively a CDT agent with a high value on honesty), then Will could say “I precommit to not letting you drive me to to town unless you pay me $0.99 right now” (we assume Derek has money on him or that Will is capable of somehow paying Derek a negative amount in town) and then Derek would have no choice but to comply. And if instead both agents ran dishonest CDT, then words mean nothing and Derek would silently drive Will to town for the $1 altruism utility. So the moral of the story is that whoever runs FDT wins, with ties broken by whoever has the information advantage. The magnitude of “winning” is very different, because FDT-Derek-FDT-Will ends up netting Derek a million dollars while FDT-Derek-CDT-Will only gets Derek $199,99, but FDT-Derek wins nevertheless.

Firmament 15 Mar 2026 5:22 UTC
2 points
0
in reply to: DanielW’s comment on: DanielW’s Shortform
I gave some reasons we might think (in real world scenarios) that CDT tends to better explain behavior (e.g., Braess’s paradox), though I do not believe it is one we can have enough data to answer for all time.
Indeed, whatever humans do seems to be closer to CDT than other decision theories, although humans use various concepts like justice, trust, and honor to approximate FDT occasionally.
People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT.
I disagree. I think the difference in high and low trust societies can largely be attributed to differences in utility functions and signaling effects. This appears empirically true (high-trust societies tend to have more information on each others activities, making signaling more meaningful, tend to poll as valuing trust, etc).
The differences in utility functions are the humans’ way of implementing FDT (since FDT is too hard to reason about for evolution to instill it directly), and the signaling effects/mutual knowledge are what makes FDT worth it.
Really, in this scenario, the fact that Will values honesty and promise-keeping means that CDT-Will is implementing a decision theory somewhere between CDT and FDT. FDT-Will effectively values honesty at infinity, while CDT-Will values it at $200. Your argument seems to be that, due to uncertainties in real life, the optimal value to place on honesty is somewhere between zero and infinity, but not at either extreme. Which is true.
Also, the problem isn’t “FDT-hostile” per-say. Derek is just trying to maximize his utility, he only cares what value he thinks he can get Will to honestly pay him. FDT does worse because it recommends taking the deal in Parfit’s hitchhiker and doesn’t have a posterior restraint on honest signaling.
If Derek was a misanthrope, Will could lose under CDT. If Derek valued saving Will’s life at −195 (instead of +6), Derek would leave CDT Will to die and still save FDT Will. This is why I predicated that Derek has to be a decent person, he prefers a scenario where everyone wins to one where he wins and Will dies.
I guess FDT-hostile is too strong a way to put it, since it implies the problem is an unfair problem. But as @papetoast said, there are some problems that FDT does better on, and some that CDT does better on (like the variant where Derek is misanthropic), and this one is one that CDT does better on.
Parfit’s Hitchhiker is a more realistic yet isomorphic framing of Newcomb’s Problem
Minor nitpick but Parfit’s hitchiker is isomorphic under CDT and to Newcomb’s Problem with transparent boxes under both CDT and EDT but not Newcomb’s problem in general. CDT doesn’t care about evidentiary probability, but if you don’t know what is in the boxes EDT says you should act probabilistically.
Oops, I wasn’t aware that there was a distinction between the transparent-box version and the opaque-box version. Thank you for the correction.
So the moral of the story is “run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems”
I mean, I guess that is not unfair, but it seems bad. It would imply honest signals are good for CDT agents and dishonest signals are good for FDT agents.
I am confused as to what you mean. CDT and FDT-emulating-CDT act the same, so they’re equally honest and get equally as much benefit from honesty. Is this about this specific problem? Or all similar problems? But this doesn’t seem to be load-bearing, so, whatever.
Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment
Yes, this is the point. In the real world, for some value people are unwilling to cheat and for some value people are willing to cheat. The number of people, on seeing someone drop a $5 bill, that would return it to the person is greater than the people who seeing them drop a $100 bill would return it. This is a pretty trivial observation. Honesty in the real world is dependent on external signaling effects (i.e. people know others will be more honest with them if they are seen to be honest with others) and individual values (i.e., people are more honest because they ascribe some nebulous value to being honest, people who ascribe a greater value to honesty will be more honest than those who ascribe less of a value).
True, the $200 honesty-value seems to be there just to make CDT act more-FDT-like.
If you set honesty to $0, CDT-Will pays nothing
Derek still saves them
Dang, you’re right, I really should have noticed that.
But the point of the hypothetical is to assume more normal human values. Most humans don’t like killing people and don’t like being dishonest.
Wait, does the $200 honesty-value actually matter here? It doesn’t seem like it changes the results of the hypothetical if you remove it, and removing it would make it easier to reason about.
FDT-Will still pays up to $1M and Derek extracts near-maximal surplus.
Which seems suboptimal. Extracting all of the surplus value for little effort doesn’t seem like a good thing. In society we ideally want to have parties be able to negotiate how to best divide the surplus according to various principles and social values. FDT-Will can be taken advantage of because he doesn’t have realistic human values. Similarly, in the typical formulation the driver can leave the hitchiker to die and the hitchhiker will be dishonest at any dollar figure because they have unrealistic utility functions.
It seems that if Derek didn’t value honesty so highly that he wouldn’t stick to his first offer and they would be able to come to a fairer deal. But this would be bad for Derek.
If tremendously valuing honesty is equivalent to FDT in this scenario (which it roughly seems to be but only because everyone makes a bunch of promises at the start in the desert), then in the scenario Derek is basically running FDT and using it to gain an advantage by precommitting to a single offer (the scenario explicitly says that negotiation is impossible, but in real life this would only happen if Derek is using some weird nonconventional negotiation tactics, and using a massive value on trust to precommit to the first offer made is one such tactic). So since Derek ends up with all the utility here, I guess FDT is good if you’re Derek and only bad if you’re Will. But I haven’t thought about this enough.
As I said in my OP, my comment was to make more realistic but still keep most of the simplifying assumptions. You can add more variables to make it more realistic—the more you add the more complex it becomes to model.
I was focused on the low-level claims so I neglected to paste that top-level comment into Claude’s chat; oops. Anyways, I don’t see the value in making the scenario complex. If the goal is to show a flaw in FDT, then that flaw will manifest in a simple scenario, which would be easier to reason about. But I guess if the goal is to show what should be done by a real human pragmatically, then complexity might be fine.
Poster 1 claims Will can’t do this because he doesn’t know Derek is modeling him
That wasn’t what I was claiming. Did you have the same confusion? The problem Will has is he modelling to pay Derek or not based on the assumption that if he doesn’t pay Derek he would have been not saved (i.e., exactly how FDT models Will’s decision under the traditional Parfit’s dilemma).
Well, Derek is modeling Will on two levels. Derek is modeling what prices out of all possible prices Will would pay at, and Derek is modeling whether Will will pay the price that Derek actually decides. Will is only aware of the latter level, but isn’t aware of the price-setting that Derek was doing before. So Will can’t effectively leverage FDT, since he isn’t aware of that first level of modeling.
The question of whether FDT requires knowledge that you’re being modeled to apply counterfactual reasoning is a genuinely deep question about FDT’s foundations, and it’s where the real philosophical action is.
No it isn’t, FDT plainly doesn’t require that. FDT gives different outputs if you are being modeled in so far as how the agent is being modeled affects how the agent assesses the values at issue.
I’m having trouble comprehending this and should probably get some sleep, but it would seem Claude is being weird and overconfident here so I hereby downgrade my overconfident endorsement of Claude’s outputs from “insightful and correct” to “looks right in some places but makes mistakes or is overconfident in other places”.

Firmament 15 Mar 2026 1:27 UTC
1 point
0
in reply to: DanielW’s comment on: DanielW’s Shortform
Hm, I guess it’s an empirical question then, of whether these situations happen in real life often enough to warrant using CDT or FDT-acting-like-CDT. I think FDT still wins out in the end, because FDT will emulate CDT if it realizes it lives in an FDT-hostile world (thus taking CDT-like actions out of an abundance of caution), while CDT has a harder time emulating FDT (CDT would need to use precommitment, while FDT does not).
I think that, in real life, humans are already near-optimal on this. People in low-trust FDT-hostile communities will grow up running CDT, while people in high-trust communities grow up running FDT. Issues occur when a person’s community changes or if they move to a new community, but that problem is perhaps outside of the scope of decision theory.
So the moral of the story is “run FDT, but emulate CDT if you live near a lot of information-asymmetric FDT-hostile problems”
And, of course, the point of LessWrong decision theory is to choose which one to give to a superintelligence (I think), since humans can’t actually choose their own decision theory (for the most part). And superintelligences probably won’t end up on the short side of an information asymmetry very often.
You could construct a problem that unfairly favors CDT instead, like “Newcomb’s Problem but, unbeknownst to you, if you one-box, you die”.
Oops, meant “unfairly favors FDT instead; if you two-box, you die”. It is indeed not very realistic, but it is possible to construct an isomorphic question in a more realistic setting, much like how Parfit’s Hitchhiker is a more realistic yet isomorphic framing of Newcomb’s Problem. So, again, it’s an empirical question of which decision theory is favored in situations that occur in real life.
Claude Opus 4.6 Extended weighs in with what I believe to be insightful and correct critique (oneshot/no selection; told to “focus on logical errors”; I’m “Poster 2″ and you’re “Poster 1”)
There are a few errors and confusions here that neither poster fully identifies:
The honesty parameter is doing most of the work, and it’s doing different work for each DT. Under CDT, the $200 honesty value is the entire mechanism by which Derek can extract payment—without it, CDT-Will in town reasons “the ride already happened, paying has no causal effect on anything I care about” and pays $0. Under FDT, the honesty value is nearly irrelevant ($200 on top of $1,000,000 from counterfactual reasoning). So the comparison isn’t really “FDT vs CDT”—it’s “FDT’s counterfactual reasoning makes the full life-value extractable vs. CDT needs an exogenous honesty parameter bolted on to make any value extractable.” Poster 1 has quietly smuggled in a mechanism that makes CDT behave cooperatively for reasons that have nothing to do with CDT’s actual decision-theoretic properties, then compared the magnitude of extractable surplus.
If you set honesty to $0, CDT-Will pays nothing and Derek (knowing this) never stops. FDT-Will still pays up to $1M and Derek extracts near-maximal surplus. The comparison reverses in valence: now CDT-Will dies and FDT-Will lives. This is just… standard Parfit’s Hitchhiker again.
The “Will chooses his decision theory” argument is incoherent as a critique of FDT. Poster 1 frames it as a problem that FDT-Will, given the choice, would “choose to be a CDT agent.” But that’s just FDT working as intended. FDT says “implement whatever algorithm leads to best outcomes.” If in this specific information-asymmetric environment the CDT-like algorithm dominates, FDT recommends it. This isn’t FDT losing—it’s FDT being a meta-theory that can recommend CDT-like behavior when appropriate. CDT can’t do the reverse; it can’t say “in Newcomb-like problems, implement FDT-like behavior” because CDT lacks the counterfactual reasoning to even identify when that’s beneficial.
Poster 2 gestures at this (“FDT can emulate CDT”) but doesn’t make the sharper point that this “emulation” is just FDT functioning correctly at the meta-level.
The information asymmetry is contrived in a way that’s load-bearing but presented as “realistic.” Derek has perfect knowledge of Will’s utility function, decision theory, and honesty parameter, while Will has zero strategic information about Derek’s pricing. Poster 1 claims this is more realistic than Newcomb, but a scenario requiring one-sided perfect utility-function-level mind-reading is not obviously closer to real-world bargaining. In actual markets both parties have partial information and the dynamics look very different. The “Amazon knows more about you” analogy is a much weaker asymmetry than “Amazon knows your exact utility function and decision theory with arbitrarily high confidence.”
An underexplored FDT subtlety with partial information. Under FDT, Will reasons about what happens when copies/simulations of his algorithm are evaluated by other agents. Poster 1 claims Will can’t do this because he doesn’t know Derek is modeling him—but this is confused about how FDT works. FDT-Will doesn’t need to know the specific modeling; he reasons “I am implementing algorithm A. In all environments where A is evaluated—including by potential predictors—what are its outputs?” The question of whether FDT requires knowledge that you’re being modeled to apply counterfactual reasoning is a genuinely deep question about FDT’s foundations, and it’s where the real philosophical action is. Neither poster engages with it.
Basically: poster 1 has constructed a scenario where the honesty parameter does the heavy lifting for CDT and the information asymmetry does the heavy lifting against FDT, then concluded that CDT “wins”—but the lesson is closer to “bargaining power determines surplus distribution, and agents who can credibly commit get better deals in symmetric info but are more exploitable under asymmetric info.” Which is true but is a general feature of commitment, not a special failure of FDT.

Firmament 14 Mar 2026 21:44 UTC
2 points
0
in reply to: DanielW’s comment on: DanielW’s Shortform
Ah, I see! So the scenario favors CDT only because Will lacks full information on the problem. Will thinks he’s playing Parfit’s Hitchhiker, but in reality he’s playing Ultimatum.
I dunno, it doesn’t really seem like a fair problem. You could construct a problem that unfairly favors CDT instead, like “Newcomb’s Problem but, unbeknownst to you, if you one-box, you die”.
When choosing his decision theory, maybe Will should decide to run FDT and then conservatively make CDT-like actions when he doesn’t have full information, if he’s expecting to encounter a lot of situations like this.

Firmament 14 Mar 2026 18:47 UTC
2 points
0
in reply to: DanielW’s comment on: DanielW’s Shortform
My knowledge of decision theory comes exclusively from reading relevant LessWrong posts when the mood takes me, but it seems to me that FDT-Will would instead act like this:
1. Assume Derek uses CDT and Will uses FDT (I do not know what would happen if both Derek and Will used FDT, but what happens would probably be the same as what happens when two FDT agents play the Ultimatum Game, since the situations are similar).
2. Imagine Derek demands $100,000,199.99 from Will.
3. When Will gets back to town, he reasons like so:
4. 1. “If I implement the algorithm that always pays Derek money when he requests less than $100,000,200, then the copy of the algorithm living in Derek’s mind when he was setting the price will also pay, meaning Derek will demand $100,000,199.99 from me, leaving me one cent better off.”
  2. “If I implement the algorithm that does not pay no matter what, then the copy of the algorithm living in Derek’s mind when he was setting the price will also not pay no matter what, meaning Derek will not demand any money at all, since he predicts that I will not pay any sum of money he asks for. Derek will then drive me to town free of charge (since he wants me to live), leaving me $1,000,200 better off.”
5. Will determines that the algorithm that doesn’t pay money no matter what gives better outcomes, so he doesn’t pay. Derek terminates the simulation and, in real life, gives Will a ride free of charge.
6. (Will then terminates that simulation and, in realer-life, decides to become an FDT agent)
Thus, FDT outperforms CDT, since CDT pays $199.99 and FDT pays nothing at all.

Firmament 9 Nov 2025 16:19 UTC
3 points
1
in reply to: teradimich’s comment on: Mourning a life without AI
Probabilities 1 and 2 are correlated, so you can’t multiply them.
Although, by multiplying them, you got a probability of 0.6, and the Manifold market on ASI existing by 2035 (34 traders) is trading at ~40%. So if the Manifold market is correct, then it’s true that the existential risk from ASI before 2035 is ≤0.5.
Regardless, technically the post author claimed that the probability for existential risk from ASI before 2045 was ≥0.5, and only said that it’s likely that better-than-human-AI would exist by 2035.

Firmament 15 Oct 2025 21:49 UTC
2 points
0
on: RiskiPedia
The book “You Bet Your Life: Your Guide to Deadly Risk” by Buff & Buff is a similar aggregation/breakdown of death statistics, though since it is a book it is not interactive. Perhaps you or other contributors could look through it for sources or inspiration.

Firmament 6 Feb 2025 17:06 UTC
2 points
0
in reply to: Artikan’s comment on: Game Theory As A Dark Art
1. Get a word and decide based on whether the sum of the alphabetical indexes (a=1, o=15, ect) is even or odd. To get the word, explain the above to the other person and then say you’re going to ask a question of them that they must answer quickly (to avoid computing the answer and then lying). Then say “what’s the name of your hometown” or something.
2. Both of you pick a number in secret. If the sum is even, you win. If the sum is odd, they win. Say the numbers at the same time to prevent cheating. Redo if someone stutters.
3. Rock-Paper-Scissors. Normally pretty easy to predict the other person’s actions, but if the stakes are high enough both players will climb to high enough layers of I-think-you-think-I-think that it’s equivalent to #2.
4. Pick a narrow axis like age, height, shoe size, latitude of birthplace, ect. Whoever has the bigger number in that axis wins.
5. This https://www.hillelwayne.com/post/randomness/