My suspicion is that if we were to work out the math behind FDT (and it’s up in the air right now whether this is even possible) and apply it to humans, the appropriate reference class for a typical human decision would be tiny, basically just copies of oneself in other possible universes.
One reason for suspecting this is that humans aren’t running clean decision theories, but have all kinds of other considerations and influences impinging on their decisions. For example psychological differences between us around risk tolerance and spending/donating money, different credences for various ethical ideas/constraints, different intuitions about AI safety and other people’s intentions, etc., probably make it wrong to think of us as belonging to the same reference class.
Does the appriopriate [soft] reference class scale with intersimulizability of agents? i.e. generally greater more computationally powerful agents are better at simulating other agents and this will generically push towards the regime where FDT gives a larger reference class.
The asymptote would be some sort of acausal society of multiverse higher-order cooperators.
Yes, I imagine that powerful agents could eventually adopt clean (easy to reason about) decision theories, simulate other agents until they also adopt clean decision theories, and then they can reason about things like, “If I decide to X, that logically implies these other agents making decisions Y and Z”.
(Except it can’t be this simple, because this runs into problems with commitment races, e.g., while I’m simulating another agent, they suspect this and as a result make a bunch of commitments that give themselves more bargaining power. But something like this, more sophisticated in some way, might turn out to work.)
My suspicion is that if we were to work out the math behind FDT (and it’s up in the air right now whether this is even possible) and apply it to humans, the appropriate reference class for a typical human decision would be tiny, basically just copies of oneself in other possible universes.
One reason for suspecting this is that humans aren’t running clean decision theories, but have all kinds of other considerations and influences impinging on their decisions. For example psychological differences between us around risk tolerance and spending/donating money, different credences for various ethical ideas/constraints, different intuitions about AI safety and other people’s intentions, etc., probably make it wrong to think of us as belonging to the same reference class.
re first paragraph that seems wrong, a continuous relaxation of FDT seems like it ought to do what people seem to intuitively think FDT does
Does the appriopriate [soft] reference class scale with intersimulizability of agents?
i.e. generally greater more computationally powerful agents are better at simulating other agents and this will generically push towards the regime where FDT gives a larger reference class.
The asymptote would be some sort of acausal society of multiverse higher-order cooperators.
Yes, I imagine that powerful agents could eventually adopt clean (easy to reason about) decision theories, simulate other agents until they also adopt clean decision theories, and then they can reason about things like, “If I decide to X, that logically implies these other agents making decisions Y and Z”.
(Except it can’t be this simple, because this runs into problems with commitment races, e.g., while I’m simulating another agent, they suspect this and as a result make a bunch of commitments that give themselves more bargaining power. But something like this, more sophisticated in some way, might turn out to work.)