There may be an interesting connection between this example and AIs knowing each other’s source code. The idea is, if one AI can unilaterally prove its source code to another without the receiver being able to credibly deny receipt of the proof, then it should change its source code to commit to an unfair agreement that favors itself, then prove this. If it succeeds in being the first to do so, the other side then has no choice but to accept. So, Freaky Fairness seems to depend on the details of the proof process in some way.
If it succeeds in being the first to do so, the other side then has no choice but to accept.
This presumes that the other side obeys standard causal decision theory; in fact, it’s an illustration of why causal decision theory is vulnerable to exploitation if precommitment is available, and suggests that two selfish rational CDT agents who each have precommitment options will generally wind up sabotaging each other.
This is a reason to reject CDT as the basis for instrumental rationality, even if you’re not worried that Omega is lurking around the corner.
You can reject CDT but what are you going to replace it with? Until Eliezer publishes his decision theory and I have a chance to review it, I’m sticking with CDT.
I thought cousin_it’s result was really interesting because it seems to show that agents using standard CDT can nevertheless convert any game into a cooperative game, as long as they have some way to prove their source code to each other. My comment was made in that context, pointing out that the mechanism for proving source code needs to have a subtle property, which I termed “consensual”.
One obvious “upgrade” to any decision theory that has such problems is to discard all of your knowledge (data, observations) before making any decisions (save for some structural knowledge to leave the decision algorithm nontrivial). For each decision that you make (using given decision algorithm) while knowing X, you can make a conditional decision (using the same decision algorithm) that says “If X, then A else B”, and then recall whether X is actually true. This, for example, mends the particular failure of not being able to precommit (you remember that you are on the losing branch only after you’ve made the decision to do a certain disadvantageous action if you are on the losing branch).
You can claim that you are using such a decision theory and hence that I should find your precommitments credible, but if you have no way of proving this, then I shouldn’t believe you, since it is to your advantage to have me believe you are using such a decision theory without actually using it.
From your earlier writings I think you might be assuming that AIs would be intelligent enough to just know what decision algorithms others are using, without any explicit proof procedure. I think that’s an interesting possibility to consider, but not a very likely one. But maybe I’m missing something. If you wrote down any arguments in favor of this assumption, I’d be interested to see them.
That was an answer for your question about what should you replace CDT with. If you won’t be able to convince other agents that you now run on timeless CDT, you gain a little smaller advantage than otherwise, but that’s a separate problem. If you know that your claims of precommitment won’t be believed, you don’t precommit, it’s that easy. But sometimes, you’ll find a better solution than if you only lived in a moment.
Also note that even if you do convince other agents about the abstract fact that your decision theory is now timeless, it won’t help you very much, since it doesn’t prove that you’ll precommit in a specific situation. You only precommit in a given situation if you know that this action makes the situation better for you, which in case of cooperation means that the other side will be able to tell whether you actually precommited, and this is not at all the same as being able to tell what decision theory you use.
Since using a decision theory with precommitment is almost always an advantage, it’s easy to assume that a sufficiently intelligent agent always uses something of the sort, but that doesn’t allow you to know more about their actions—in fact, you know less, since such agent has more options now.
But sometimes, you’ll find a better solution than if you only lived in a moment.
Yes, I see that your decision theory (is it the same as Eliezer’s?) gives better solutions in the following circumstances:
dealing with Omega
dealing with copies of oneself
cooperating with a counterpart in another possible world
Do you think it gives better solutions in the case of AIs (who don’t initially think they’re copies of each other) trying to cooperate? If so, can you give a specific scenario and show how the solution is derived?
Yes. In the original setting of FF the tournament setup enforces that everyone’s true source code is common knowledge). Most likely the problem is hard to solve without at least a little common knowledge.
Yes. In the original setting of FF the tournament setup enforces that everyone’s true source code is common knowledge. Most likely the problem is hard to solve without at least a little common knowledge.
Hmm, I’m not seeing what common knowledge has to do with it. Instead, what seems necessary is that the source code proving process must be consensual rather than unilateral. (The former has to exist, and the latter cannot, in order for FF to work.)
A model for a unilateral proof process would be a trustworthy device that accepts a string from the prover and then sends that string along with the message “1” to the receiver if the string is the prover’s source code, and “0″ otherwise.
A model for a consensual proof process would be a trustworthy device that accepts from the prover and verifier each a string, and sends a message “1” to both parties if the two strings are identical and represent the prover’s source code, and “0″ otherwise.
In your second case one party can still cheat by being out of town when the “1” message arrives. It seems to me that the whole endeavor hinges on the success of the exchange being common knowledge.
In your second case one party can still cheat by being out of town when the “1” message arrives.
I’m not getting you. Can you elaborate on which party can cheat, and how. And by “second case” do you mean the “unilateral” one or the “consensual” one?
For a rigorous demonstration, imagine this: while preparing to play the Freaky Fairness game, I managed to install a subtle bug into the tournament code that will slightly and randomly distort all source code inputs passed to my algorithm. Then I submit some nice regular quining-cooperative program. In the actual game your program will assume I will cooperate, while mine will see you as a defector and play to win. When the game gives players an incentive to misunderstand, even a slight violation of “you know that I know that you know...” can wreak havoc, hence my emphasis on common knowledge.
In the actual game your program will assume I will cooperate, while mine will see you as a defector and play to win.
I see what you’re saying now, but this seems easy to prevent. Since you have changed your source code to FF, and I know you have, I can simply ask you whether you believe I am a defector, and treat you as a defector if you say “yes”. I know your source code so I know you can’t lie (specify Freaky Fairness to include this honesty). Doesn’t that solve the problem?
ETA: There is still a chance of accidental miscommunication, but you no longer have an incentive to deliberately cheat.
In this solution you have an incentive to similarly be outa town when I say “no”. Think through it recursively. Related topics: two generals problem, two-phase commit.
Ok, let’s say that two FFs can establish a cryptographically secure channel. The two players can each choose to block the channel at any time, but it can’t read, inject, delete, or change the order of messages. Is that sufficient to make it arbitrarily unlikely for any player to put the FFs into a state where FF1 will treat FF2 as a cooperator, but FF2 will treat FF1 as a defector? I think the answer is yes, using the following protocol:
FF1 will start by sending a 1 or 0 (chosen randomly) to FF2. After that, each FF will send a 1 or 0 after it receives a 1 or 0 from the other, keeping the number of 1s sent no more than the number of 1s received plus one. If an FF receives N 1s before a time limit is reached, it will threat the other as a cooperator, otherwise as a defector. Now in order to cheat, a player would have to guess when to block the channel, and the probability of guessing the right time goes to 0 as N goes to infinity.
This is not necessarily the most efficient protocol, but it may be good enough as a proof of concept. On the other hand, the “merger by secure joint construction” approach seems to have the advantage of not having to deal with this problem. Or is there an analogous one that I’m not seeing?
There may be an interesting connection between this example and AIs knowing each other’s source code. The idea is, if one AI can unilaterally prove its source code to another without the receiver being able to credibly deny receipt of the proof, then it should change its source code to commit to an unfair agreement that favors itself, then prove this. If it succeeds in being the first to do so, the other side then has no choice but to accept. So, Freaky Fairness seems to depend on the details of the proof process in some way.
This presumes that the other side obeys standard causal decision theory; in fact, it’s an illustration of why causal decision theory is vulnerable to exploitation if precommitment is available, and suggests that two selfish rational CDT agents who each have precommitment options will generally wind up sabotaging each other.
This is a reason to reject CDT as the basis for instrumental rationality, even if you’re not worried that Omega is lurking around the corner.
You can reject CDT but what are you going to replace it with? Until Eliezer publishes his decision theory and I have a chance to review it, I’m sticking with CDT.
I thought cousin_it’s result was really interesting because it seems to show that agents using standard CDT can nevertheless convert any game into a cooperative game, as long as they have some way to prove their source code to each other. My comment was made in that context, pointing out that the mechanism for proving source code needs to have a subtle property, which I termed “consensual”.
One obvious “upgrade” to any decision theory that has such problems is to discard all of your knowledge (data, observations) before making any decisions (save for some structural knowledge to leave the decision algorithm nontrivial). For each decision that you make (using given decision algorithm) while knowing X, you can make a conditional decision (using the same decision algorithm) that says “If X, then A else B”, and then recall whether X is actually true. This, for example, mends the particular failure of not being able to precommit (you remember that you are on the losing branch only after you’ve made the decision to do a certain disadvantageous action if you are on the losing branch).
You can claim that you are using such a decision theory and hence that I should find your precommitments credible, but if you have no way of proving this, then I shouldn’t believe you, since it is to your advantage to have me believe you are using such a decision theory without actually using it.
From your earlier writings I think you might be assuming that AIs would be intelligent enough to just know what decision algorithms others are using, without any explicit proof procedure. I think that’s an interesting possibility to consider, but not a very likely one. But maybe I’m missing something. If you wrote down any arguments in favor of this assumption, I’d be interested to see them.
That was an answer for your question about what should you replace CDT with. If you won’t be able to convince other agents that you now run on timeless CDT, you gain a little smaller advantage than otherwise, but that’s a separate problem. If you know that your claims of precommitment won’t be believed, you don’t precommit, it’s that easy. But sometimes, you’ll find a better solution than if you only lived in a moment.
Also note that even if you do convince other agents about the abstract fact that your decision theory is now timeless, it won’t help you very much, since it doesn’t prove that you’ll precommit in a specific situation. You only precommit in a given situation if you know that this action makes the situation better for you, which in case of cooperation means that the other side will be able to tell whether you actually precommited, and this is not at all the same as being able to tell what decision theory you use.
Since using a decision theory with precommitment is almost always an advantage, it’s easy to assume that a sufficiently intelligent agent always uses something of the sort, but that doesn’t allow you to know more about their actions—in fact, you know less, since such agent has more options now.
Yes, I see that your decision theory (is it the same as Eliezer’s?) gives better solutions in the following circumstances:
dealing with Omega
dealing with copies of oneself
cooperating with a counterpart in another possible world
Do you think it gives better solutions in the case of AIs (who don’t initially think they’re copies of each other) trying to cooperate? If so, can you give a specific scenario and show how the solution is derived?
Unless, of course, you already know that most AIs will go ahead and “suicidally” deny the unfair agreement.
Yes. In the original setting of FF the tournament setup enforces that everyone’s true source code is common knowledge). Most likely the problem is hard to solve without at least a little common knowledge.
Hmm, I’m not seeing what common knowledge has to do with it. Instead, what seems necessary is that the source code proving process must be consensual rather than unilateral. (The former has to exist, and the latter cannot, in order for FF to work.)
A model for a unilateral proof process would be a trustworthy device that accepts a string from the prover and then sends that string along with the message “1” to the receiver if the string is the prover’s source code, and “0″ otherwise.
A model for a consensual proof process would be a trustworthy device that accepts from the prover and verifier each a string, and sends a message “1” to both parties if the two strings are identical and represent the prover’s source code, and “0″ otherwise.
In your second case one party can still cheat by being out of town when the “1” message arrives. It seems to me that the whole endeavor hinges on the success of the exchange being common knowledge.
I’m not getting you. Can you elaborate on which party can cheat, and how. And by “second case” do you mean the “unilateral” one or the “consensual” one?
The “consensual” one.
For a rigorous demonstration, imagine this: while preparing to play the Freaky Fairness game, I managed to install a subtle bug into the tournament code that will slightly and randomly distort all source code inputs passed to my algorithm. Then I submit some nice regular quining-cooperative program. In the actual game your program will assume I will cooperate, while mine will see you as a defector and play to win. When the game gives players an incentive to misunderstand, even a slight violation of “you know that I know that you know...” can wreak havoc, hence my emphasis on common knowledge.
I see what you’re saying now, but this seems easy to prevent. Since you have changed your source code to FF, and I know you have, I can simply ask you whether you believe I am a defector, and treat you as a defector if you say “yes”. I know your source code so I know you can’t lie (specify Freaky Fairness to include this honesty). Doesn’t that solve the problem?
ETA: There is still a chance of accidental miscommunication, but you no longer have an incentive to deliberately cheat.
In this solution you have an incentive to similarly be outa town when I say “no”. Think through it recursively. Related topics: two generals problem, two-phase commit.
Ok, let’s say that two FFs can establish a cryptographically secure channel. The two players can each choose to block the channel at any time, but it can’t read, inject, delete, or change the order of messages. Is that sufficient to make it arbitrarily unlikely for any player to put the FFs into a state where FF1 will treat FF2 as a cooperator, but FF2 will treat FF1 as a defector? I think the answer is yes, using the following protocol:
FF1 will start by sending a 1 or 0 (chosen randomly) to FF2. After that, each FF will send a 1 or 0 after it receives a 1 or 0 from the other, keeping the number of 1s sent no more than the number of 1s received plus one. If an FF receives N 1s before a time limit is reached, it will threat the other as a cooperator, otherwise as a defector. Now in order to cheat, a player would have to guess when to block the channel, and the probability of guessing the right time goes to 0 as N goes to infinity.
This is not necessarily the most efficient protocol, but it may be good enough as a proof of concept. On the other hand, the “merger by secure joint construction” approach seems to have the advantage of not having to deal with this problem. Or is there an analogous one that I’m not seeing?