orthonormal comments on orthonormal’s Shortform

orthonormal 25 Mar 2025 2:06 UTC
LW: 6 AF: 5
2
AF
How do you formalize the definition of a decision-theoretically fair problem, even when abstracting away the definition of an agent as well as embedded agency?
I’ve failed to find anything in our literature.
It’s simple to define a fair environment, given those abstractions: a function E from an array of actions to an array of payoffs, with no reference to any other details of the non-embedded agents that took those actions and received those payoffs.
However, fair problems are more than just fair environments: we want a definition of a fair problem (and fair agents) under which, among other things:
- The classic Newcomb’s Problem against Omega, with certainty or with 1% random noise: fair
- Omega puts $1M in the box iff it predicts that the player consciously endorses one-boxing, regardless of what it predicts the player will actually do (e.g. misunderstand the instructions and take a different action than they endorsed): unfair
- Prisoner’s Dilemma between two agents who base their actions on not only each others’ predicted actions in the current environment, but also their predicted actions in other defined-as-fair dilemmas: fair
  - For example, PrudentBot will cooperate with you if it deduces that you will cooperate with it and also that you would defect against DefectBot, because it wants to exploit CooperateBots).
- Prisoner’s Dilemma between two agents who base their actions on each others’ predicted actions in defined-as-unfair dilemmas: unfair
  - It would let us smuggle in unfairness from other dilemmas; e.g. if BlueEyedBot only tries Löbian cooperation against agents with blue eyes, and MetaBlueEyedBot only tries Löbian cooperation against agents that predictably cooperate with BlueEyedBot, then the Prisoner’s Dilemma against MetaBlueEyedBot should count as unfair.
Modal combat doesn’t need to worry about this, because all the agents in it are fair-by-construction.
Yeah, I know, it’s about a decade late to be asking this question.
- Gurkenglas 25 Mar 2025 12:23 UTC
  2 points
  0
  Parent
  It sounds like you’re trying to define unfair as evil.
- Vladimir_Nesov 25 Mar 2025 5:40 UTC
  LW: 2 AF: 1
  0
  AF Parent
  It’s an essential aspect of decision making for an agent to figure out where it might be. Thought experiments try to declare the current situation, but they don’t necessarily need to be able to convincingly succeed. Algorithmic induction, such as updating from Solomonoff prior, is the basic way an agent figures out which situations it should care about, and declaring that we are working with a particular thought experiment doesn’t affect the prior. In line with updatelessness, an agent should be ready for observations in general (according to which of them it cares about more), rather than particular “fair” observations, so distinguishing observations that describe “fair” thought experiments doesn’t seem right either.
- orthonormal 25 Mar 2025 5:04 UTC
  LW: 2 AF: 1
  0
  AF Parent
  My current candidate definitions, with some significant issues in the footnotes:
  A fair environment is a probabilistic function $F (x_{1}, . . ., x_{N}) = [X_{1}, . . ., X_{N}]$ from an array of actions to an array of payoffs.
  An agent $A$ is a random variable
  $A (F, A_{1}, . . ., A_{i - 1}, A_{i} = A, A_{i + 1}, . . ., A_{N})$
  which takes in a fair environment $F$ ^[1] and a list of agents (including itself), and outputs a mixed strategy over its available actions in $F$ . ^[2]
  A fair agent is one whose mixed strategy is a function of subjective probabilities^[3] that it assigns to [the actions of some finite collection of agents in fair environments, where any agents not appearing in the original problem must themselves be fair].
  Formally, if $A$ is a fair agent in with a subjective probability estimator $P$ , $A$ ’s mixed strategy in a fair environment $F$ ,
  $A (F, A_{1}, . . ., A_{i - 1}, A_{i} = A, A_{i + 1}, . . ., A_{N})$
  should depend only on a finite collection of $A$ ’s subjective probabilities about outcomes
  ${P (F_{k} (A_{1}, . . ., A_{N}, B_{1}, . . . B_{M})) = [X_{1}, . . ., X_{N + M}]}_{k = 1}^{K}$
  for a set of fair environments $F_{1}, . . ., F_{K}$ and an additional set of fair^[4] agents^[5] $B_{1}, . . ., B_{M}$ if needed (note that not all agents need to appear in all environments).
  A fair problem is a fair environment with one designated player, where all other agents are fair agents.
  1. ^
    I might need to require every $F$ to have a default action $d_{F}$ , so that I don’t need to worry about axiom-of-choice issues when defining an agent over the space of all fair environments.
  2. ^
    I specified a probabilistic environment and mixed strategies because I think there should be a unique fixed point for agents, such that this is well-defined for any fair environment $F$ . (By analogy to reflective oracles.) But I might be wrong, or I might need further restrictions on $F$ .
  3. ^
    Grossly underspecified. What kinds of properties are required for subjective probabilities here? You can obviously cheat by writing BlueEyedBot into your probability estimator.
  4. ^
    This is an infinite recursion, of course. It works if we require each $B_{m}$ to have a strictly lower complexity in some sense than $A$ (e.g. the rank of an agent is the largest number $K$ of environments it can reason about when making any decision, and each $B_{m}$ needs to be lower-rank than $A$ ), but I worry that’s too strong of a restriction and would exclude some well-definable and interesting agents.
  5. ^
    Does the fairness requirement on the $B_{m}$ suffice to avert the MetaBlueEyedBot problem in general? I’m unsure.