i acknowledge that this is a fair criticism from the FDT perspective (as witnessed by wei dai’s recent comment how he declined the opportunity to invest in anthropic).
To clarify a possible confusion, I do not endorse using “FDT” (or UDT or LDT) here, because the state of decision theory research is such that I am very confused about how to apply these decision theories in practice, and personally mostly rely on a mix of other views about rationality and morality, including standard CDT-based game theory and common sense ethics.
(My current best guess is that there is minimal “logical correlation” between humans so LDT becomes CDT-like when applied to humans, and standard game theory seems to work well enough in practice or is the best tool that we currently have when it comes to multiplayer situations. Efforts to ground human moral/ethical intuitions on FDT-style reasoning do not seem very convincing to me so far, so I’m just going to stick with the intuitions themselves for now.)
In this particular case, I mainly wanted to avoid signaling approval of Anthropic’s plans and safety views or getting personally involved in activities that increase x-risk in my judgement. Avoiding conflicts of interest (becoming biased in favor of Anthropic in my thoughts and speech) was also an important consideration.
ah, sorry about mis-framing your comment! i tend to use the term “FDT” casually to refer to “instead of individual acts, try to think about policies and how would they apply to agents in my reference class(es)” (which i think does apply here, as i consider us sharing a plausible reference class).
My suspicion is that if we were to work out the math behind FDT (and it’s up in the air right now whether this is even possible) and apply it to humans, the appropriate reference class for a typical human decision would be tiny, basically just copies of oneself in other possible universes.
One reason for suspecting this is that humans aren’t running clean decision theories, but have all kinds of other considerations and influences impinging on their decisions. For example psychological differences between us around risk tolerance and spending/donating money, different credences for various ethical ideas/constraints, different intuitions about AI safety and other people’s intentions, etc., probably make it wrong to think of us as belonging to the same reference class.
Does the appriopriate [soft] reference class scale with intersimulizability of agents? i.e. generally greater more computationally powerful agents are better at simulating other agents and this will generically push towards the regime where FDT gives a larger reference class.
The asymptote would be some sort of acausal society of multiverse higher-order cooperators.
Yes, I imagine that powerful agents could eventually adopt clean (easy to reason about) decision theories, simulate other agents until they also adopt clean decision theories, and then they can reason about things like, “If I decide to X, that logically implies these other agents making decisions Y and Z”.
(Except it can’t be this simple, because this runs into problems with commitment races, e.g., while I’m simulating another agent, they suspect this and as a result make a bunch of commitments that give themselves more bargaining power. But something like this, more sophisticated in some way, might turn out to work.)
To clarify a possible confusion, I do not endorse using “FDT” (or UDT or LDT) here, because the state of decision theory research is such that I am very confused about how to apply these decision theories in practice, and personally mostly rely on a mix of other views about rationality and morality, including standard CDT-based game theory and common sense ethics.
(My current best guess is that there is minimal “logical correlation” between humans so LDT becomes CDT-like when applied to humans, and standard game theory seems to work well enough in practice or is the best tool that we currently have when it comes to multiplayer situations. Efforts to ground human moral/ethical intuitions on FDT-style reasoning do not seem very convincing to me so far, so I’m just going to stick with the intuitions themselves for now.)
In this particular case, I mainly wanted to avoid signaling approval of Anthropic’s plans and safety views or getting personally involved in activities that increase x-risk in my judgement. Avoiding conflicts of interest (becoming biased in favor of Anthropic in my thoughts and speech) was also an important consideration.
ah, sorry about mis-framing your comment! i tend to use the term “FDT” casually to refer to “instead of individual acts, try to think about policies and how would they apply to agents in my reference class(es)” (which i think does apply here, as i consider us sharing a plausible reference class).
My suspicion is that if we were to work out the math behind FDT (and it’s up in the air right now whether this is even possible) and apply it to humans, the appropriate reference class for a typical human decision would be tiny, basically just copies of oneself in other possible universes.
One reason for suspecting this is that humans aren’t running clean decision theories, but have all kinds of other considerations and influences impinging on their decisions. For example psychological differences between us around risk tolerance and spending/donating money, different credences for various ethical ideas/constraints, different intuitions about AI safety and other people’s intentions, etc., probably make it wrong to think of us as belonging to the same reference class.
re first paragraph that seems wrong, a continuous relaxation of FDT seems like it ought to do what people seem to intuitively think FDT does
Does the appriopriate [soft] reference class scale with intersimulizability of agents?
i.e. generally greater more computationally powerful agents are better at simulating other agents and this will generically push towards the regime where FDT gives a larger reference class.
The asymptote would be some sort of acausal society of multiverse higher-order cooperators.
Yes, I imagine that powerful agents could eventually adopt clean (easy to reason about) decision theories, simulate other agents until they also adopt clean decision theories, and then they can reason about things like, “If I decide to X, that logically implies these other agents making decisions Y and Z”.
(Except it can’t be this simple, because this runs into problems with commitment races, e.g., while I’m simulating another agent, they suspect this and as a result make a bunch of commitments that give themselves more bargaining power. But something like this, more sophisticated in some way, might turn out to work.)