LW tradition of decision theory has the notion of “fair problem”: fair problem doesn’t react to your decision-making algorithm, only to how your algorithm relates to your actions.
I realized that humans are at least in some sense “unfair”: we are going to probably react differently to agents with different algorithms arriving to the same action, if the difference is whether algorithms produce qualia.
The notion of ‘fairness’ discussed in e.g. the FDT paper is something like: it’s fair to respond to your policy, i.e. what you would do in any counterfactual situation, but it’s not fair to respond to the way that policy is decided.
I think the hope is that you might get a result like “for all fair decision problems, decision-making procedure A is better than decision-making procedure B by some criterion to do with the outcomes it leads to”.
Without the fairness assumption you could create an instant counterexample to any such result by writing down a decision problem where decision-making procedure A is explicitly penalised e.g. omega checks if you use A and gives you minus a million points if so.
What distinguishes a cooperate-rock from an agent that cooperates in coordination with others is the decision-making algorithm. Facts about this algorithm also govern the way outcome can be known in advance or explained in hindsight, how for a cooperate-rock it’s always “cooperate”, while for a coordinated agent it depends on how others reason, on their decision-making algorithms.
So in the same way that Newcomblike problems are the norm, so is the “unfair” interaction with decision-making algorithms. I think it’s just a very technical assumption that doesn’t make sense conceptually and shouldn’t be framed as “unfairness”.
More technical definition of “fairness” here is that environment doesn’t distinguish between algorithms with same policies, i.e. mappings <prior, observation_history> → action? I think it captures difference between CooperateBot and FairBot.
As I understand, “fairness” was invented as responce to statement that it’s rational to two-box and Omega just rewards irrationality.
There is a difference in external behavior only if you need to communicate knowledge about the environment and the other players explicitly. If this knowledge is already part of an agent (or rock), there is no behavior of learning it, and so no explicit dependence on its observation. Yet still there is a difference in how one should interact with such decision-making algorithms.
I think this describes minds/models better (there are things they’ve learned long ago in obscure ways and now just know) than learning that establishes explicit dependence of actions on observed knowledge in behavior (which is more like in-context learning).
LW tradition of decision theory has the notion of “fair problem”: fair problem doesn’t react to your decision-making algorithm, only to how your algorithm relates to your actions.
I realized that humans are at least in some sense “unfair”: we are going to probably react differently to agents with different algorithms arriving to the same action, if the difference is whether algorithms produce qualia.
Decision theory as discussed here heavily involves thinking about agents responding to other agents’ decision processes
The notion of ‘fairness’ discussed in e.g. the FDT paper is something like: it’s fair to respond to your policy, i.e. what you would do in any counterfactual situation, but it’s not fair to respond to the way that policy is decided.
I think the hope is that you might get a result like “for all fair decision problems, decision-making procedure A is better than decision-making procedure B by some criterion to do with the outcomes it leads to”.
Without the fairness assumption you could create an instant counterexample to any such result by writing down a decision problem where decision-making procedure A is explicitly penalised e.g. omega checks if you use A and gives you minus a million points if so.
What distinguishes a cooperate-rock from an agent that cooperates in coordination with others is the decision-making algorithm. Facts about this algorithm also govern the way outcome can be known in advance or explained in hindsight, how for a cooperate-rock it’s always “cooperate”, while for a coordinated agent it depends on how others reason, on their decision-making algorithms.
So in the same way that Newcomblike problems are the norm, so is the “unfair” interaction with decision-making algorithms. I think it’s just a very technical assumption that doesn’t make sense conceptually and shouldn’t be framed as “unfairness”.
More technical definition of “fairness” here is that environment doesn’t distinguish between algorithms with same policies, i.e. mappings <prior, observation_history> → action? I think it captures difference between CooperateBot and FairBot.
As I understand, “fairness” was invented as responce to statement that it’s rational to two-box and Omega just rewards irrationality.
There is a difference in external behavior only if you need to communicate knowledge about the environment and the other players explicitly. If this knowledge is already part of an agent (or rock), there is no behavior of learning it, and so no explicit dependence on its observation. Yet still there is a difference in how one should interact with such decision-making algorithms.
I think this describes minds/models better (there are things they’ve learned long ago in obscure ways and now just know) than learning that establishes explicit dependence of actions on observed knowledge in behavior (which is more like in-context learning).