What does UDT do in asymmetric zero-sum games? Here’s a simple such game: player 1 chooses which player is going to pay money to the other, and player 2 chooses the sum of money to be paid, either $100 or $200. The unique Nash equilibrium is player 2 paying $100 to player 1.
Here’s a simple formalization of UDT. Let the universe be a program U which returns a utility value, and the agent be a subprogram A within U, which knows the source code of both A and U by quining, and returns an action value. The program A looks for proofs in a formal theory T that the statement A()=a logically implies U()=u, for different values of a and u. Then after spending some time looking for proofs, A returns the value of a that corresponds to the highest value of u found so far.
If two players play the above game, using identical copies of the above algorithm and treating the game as a universe containing instances of both players, will they find any proofs at all? I suspect that the answer is no, because Nash equilibrium reasoning without perfect symmetry requires you to assume that the other player is rational, which fails by infinite regress, like in Eliezer’s “AI reflection problem”. But I’d really like someone else to double check this.
ETA: if we make the algorithm look for proofs of statements like “A()=a implies U()≥u” instead, it will minimax in zero-sum games, cooperate in the PD against a logically equivalent copy of itself, and revert to defection if the copy is slightly imperfect. Same in Newcomb’s problem, the algorithm one-boxes if Omega is provably right with high enough probability, and two-boxes otherwise. The drawback is that it doesn’t lead to Nash equilibrium play against itself, only minimaxing. But it’s still nice because it captures “resonant doubt”.
What does UDT do in asymmetric zero-sum games? Here’s a simple such game: player 1 chooses which player is going to pay money to the other, and player 2 chooses the sum of money to be paid, either $100 or $200. The unique Nash equilibrium is player 2 paying $100 to player 1.
Here’s a simple formalization of UDT. Let the universe be a program U which returns a utility value, and the agent be a subprogram A within U, which knows the source code of both A and U by quining, and returns an action value. The program A looks for proofs in a formal theory T that the statement A()=a logically implies U()=u, for different values of a and u. Then after spending some time looking for proofs, A returns the value of a that corresponds to the highest value of u found so far.
If two players play the above game, using identical copies of the above algorithm and treating the game as a universe containing instances of both players, will they find any proofs at all? I suspect that the answer is no, because Nash equilibrium reasoning without perfect symmetry requires you to assume that the other player is rational, which fails by infinite regress, like in Eliezer’s “AI reflection problem”. But I’d really like someone else to double check this.
ETA: if we make the algorithm look for proofs of statements like “A()=a implies U()≥u” instead, it will minimax in zero-sum games, cooperate in the PD against a logically equivalent copy of itself, and revert to defection if the copy is slightly imperfect. Same in Newcomb’s problem, the algorithm one-boxes if Omega is provably right with high enough probability, and two-boxes otherwise. The drawback is that it doesn’t lead to Nash equilibrium play against itself, only minimaxing. But it’s still nice because it captures “resonant doubt”.