I now think we can do this without prior coordination, in some sense, by using pseudorandom bitstrings and lots of oracle calls. (Idea came from a conversation with Alex Mennen.)
Intuition: if both players are sampling their acceptable outcome sets in random fashion, and each is heavily biased toward sampling points where they do better, then the first time that they both check the same point at the same level is very probably going to be near Eliezer’s fairness point.
(As noted in Quinn’s example, we’ll need to make the map from output-time to outcome injective, but we can do this via the transform suggested in that draft.)
Consider the distribution over the agent’s acceptable outcome set with probability density proportional to a positive exponential of the agent’s utility function UA:
p(a,b)∝exp(KuA(a,b)) if (a,b) acceptable to A, 0 otherwise.
Our agents need to be deterministic, so we’ll have a vast family of them parametrized by sequences of draws from this distribution. Given a large N∈N and a sequence of {(ai,bi):0≤i≤N drawn from that distribution, we have the agent
def A():
for i in range(N):
if PA+i proves A()=a_i implies B()=b_i:
return a_i
else return default Nash equilibrium
(If we want to do this even more generally and not assume they’re considering the same set of ordered pairs, we can instead ask for proofs that A()=ai→UA()≥uA(ai,bi), where UA is the actual utility of the universe from A’s perspective and uA is the utility of a particular point in the feasible set.)
If two such agents are playing against one another, then the probability (with respect to the sequence of draws that defined each agent) of a mutually acceptable ordered pair (a,b) being chosen at each step is proportional to exp(KAuA(a,b)+KBuB(a,b)), which is maximized at Eliezer’s acceptable point, and they have N chances to get it. For KA and KB large, and for N very large, with high probability two such agents will coordinate near Eliezer’s acceptable point.
I now think we can do this without prior coordination, in some sense, by using pseudorandom bitstrings and lots of oracle calls. (Idea came from a conversation with Alex Mennen.)
Intuition: if both players are sampling their acceptable outcome sets in random fashion, and each is heavily biased toward sampling points where they do better, then the first time that they both check the same point at the same level is very probably going to be near Eliezer’s fairness point.
(As noted in Quinn’s example, we’ll need to make the map from output-time to outcome injective, but we can do this via the transform suggested in that draft.)
Consider the distribution over the agent’s acceptable outcome set with probability density proportional to a positive exponential of the agent’s utility function UA:
p(a,b)∝exp(KuA(a,b)) if (a,b) acceptable to A, 0 otherwise.
Our agents need to be deterministic, so we’ll have a vast family of them parametrized by sequences of draws from this distribution. Given a large N∈N and a sequence of {(ai,bi):0≤i≤N drawn from that distribution, we have the agent
(If we want to do this even more generally and not assume they’re considering the same set of ordered pairs, we can instead ask for proofs that A()=ai→UA()≥uA(ai,bi), where UA is the actual utility of the universe from A’s perspective and uA is the utility of a particular point in the feasible set.)
If two such agents are playing against one another, then the probability (with respect to the sequence of draws that defined each agent) of a mutually acceptable ordered pair (a,b) being chosen at each step is proportional to exp(KAuA(a,b)+KBuB(a,b)), which is maximized at Eliezer’s acceptable point, and they have N chances to get it. For KA and KB large, and for N very large, with high probability two such agents will coordinate near Eliezer’s acceptable point.