Your idea is a little similar to one-action AI that I described sometime ago. It’s a neat way to get goal stability (if you trust your magical math intuition module), but IMO it doesn’t solve all of decision theory. The tricky question is how to build the “probability model”, or assignment of expected utilities to all possible actions. A poorly chosen assignment can end up being mathematically true by making itself true through influencing your actions, like a self-fulfilling prophecy.
For example, imagine that an alien AI created a million years ago predicted that humanity will build an AI based on your decision theory, and precommitted to waging self-destructive war against us unless we give it 99% of our resources. Your AI knows everything about physics, so it will infer the existence of the alien AI at “time zero” and immediately give up the resources. But this decision of your AI was itself predicted by the alien AI like I predicted it here, and that’s why the alien AI made its precommitment in the first place. TDT tries to solve this problem by not giving in to extortion, though we don’t know how to formalize that.
For a more interesting twist, consider that our universe is likely to contain many instances of us (e.g. if it’s spatially infinite, which is currently considered feasible), scattered all over spacetime. Will the different copies of your AI cooperate with each other, or will they do something stupid like wage war? UDT tries to solve this problem and others like it by accounting for logical copies implicitly so they all end up cooperating.
TDT tries to solve this problem by not giving in to extortion, though we don’t know how to formalize that.
UDT can solve this problem by noticing that a decision to not give in to extortion makes the extortion improbable. TDT won’t be able to notice the scenario where the aliens never appear, and so won’t solve this problem for the same reason it doesn’t solve Counterfactual Mugging. (Does this mean that TDT doesn’t solve Newcomb’s problem with transparent boxes? I don’t remember hearing that, although I remember Drescher mentioning that CM is analogous to one of his thought experiments.) Eliezer, and not TDT, refers to the intuitive notion of “extortion”, and advises to not give in to extortion.
Will the different copies of your AI cooperate with each other, or will they do something stupid like wage war?
As Will recently pointed out, “cooperation” is itself an unclearly specified idea (in particular, the agent can well be a self-improving bundle of wires that quickly escapes any recognition unless it wants to signal something). Also, as I pointed out before, in PD the Pareto frontier for mixed strategies includes one player cooperating, with the other player cooperating or defecting randomly (and randomness can be from logical uncertainty). They will just bargain about who of them should defect how probably.
So non-cooperation is not always stupid, both because “cooperation” is not a clear idea, and because random defecting by one of the players remains on Pareto frontier.
That’s funny. What you described in the second paragraph is something like a 2-player bimatrix game played across time and space in which the players aren’t even sure of their opponents’ existence and in which our player’s strategy is which decision theory he uses.
Very interesting, and great food for thought. But again, the complication comes from the possible existence of another player. I would argue that it’s reasonable to assume ourselves some ‘breathing room’ of one to two million years before we have to deal with other players. Then in that case, why not build a ‘naive’ FAI which operates under the assumption that there is no other player, let it grow, and then when it has some free time, let it think of a decision theory for you? (I don’t know whether you speak for the SIAI, cousin_it, but I think it would be fair for an outsider to wonder why Yudkowsky thinks this route in particular has the greatest cost/benefit in terms of achieving FAI as fast as possible.)
I’m not affiliated with SIAI in any way. Just like you, I’m an outsider trying to clear up these topics for my own satisfaction :-)
Many people here think that we must get FAI right on the first try, because after it gains power it will resist our attempts to change it. If you code into the AI the assumption that it’s the only player, it won’t believe in other players even when it sees them, and will keep allocating resources to building beautiful gardens even as alien ships are circling overhead (metaphorically speaking). When you ask it to build some guns, it will see you as promoting a suboptimal strategy according to its understanding of what’s likely to work.
It might be preferable to build a less rigid AI that would be open to further amendments from humanity, rather than maximizing its initial utility function no matter what. But we don’t know any mathematical formalism that can express that. The first AIs are likely to be expected utility maximizers just because maximization of expected utility is mathematically neat.
The issue of rigidity is broad and important topic which has been insufficiently addressed on this site. A ‘rigid’ AI cannot be considered rational, because all rational beings are aware that their reasoning processes are prone to error. I would go on further to say that a rigid FAI can be just as dangerous (in the long-term) as a paperclip maximizer. However, the problem of implementing a ‘flexible’ AI would indeed be difficult. Such an AI would be a true inductive agent—even its confidence in the solidity of mathematical proof would be based on empirical evidence. Thus it would be difficult to predict how such an AI might function—there is a risk that the AI would ‘go insane’ as it loses confidence in the validity of the core assumptions underlying its cognitive processes. But this is already taking us far afield of the original subject of discussion.
Your idea is a little similar to one-action AI that I described sometime ago. It’s a neat way to get goal stability (if you trust your magical math intuition module), but IMO it doesn’t solve all of decision theory. The tricky question is how to build the “probability model”, or assignment of expected utilities to all possible actions. A poorly chosen assignment can end up being mathematically true by making itself true through influencing your actions, like a self-fulfilling prophecy.
For example, imagine that an alien AI created a million years ago predicted that humanity will build an AI based on your decision theory, and precommitted to waging self-destructive war against us unless we give it 99% of our resources. Your AI knows everything about physics, so it will infer the existence of the alien AI at “time zero” and immediately give up the resources. But this decision of your AI was itself predicted by the alien AI like I predicted it here, and that’s why the alien AI made its precommitment in the first place. TDT tries to solve this problem by not giving in to extortion, though we don’t know how to formalize that.
For a more interesting twist, consider that our universe is likely to contain many instances of us (e.g. if it’s spatially infinite, which is currently considered feasible), scattered all over spacetime. Will the different copies of your AI cooperate with each other, or will they do something stupid like wage war? UDT tries to solve this problem and others like it by accounting for logical copies implicitly so they all end up cooperating.
UDT can solve this problem by noticing that a decision to not give in to extortion makes the extortion improbable. TDT won’t be able to notice the scenario where the aliens never appear, and so won’t solve this problem for the same reason it doesn’t solve Counterfactual Mugging. (Does this mean that TDT doesn’t solve Newcomb’s problem with transparent boxes? I don’t remember hearing that, although I remember Drescher mentioning that CM is analogous to one of his thought experiments.) Eliezer, and not TDT, refers to the intuitive notion of “extortion”, and advises to not give in to extortion.
As Will recently pointed out, “cooperation” is itself an unclearly specified idea (in particular, the agent can well be a self-improving bundle of wires that quickly escapes any recognition unless it wants to signal something). Also, as I pointed out before, in PD the Pareto frontier for mixed strategies includes one player cooperating, with the other player cooperating or defecting randomly (and randomness can be from logical uncertainty). They will just bargain about who of them should defect how probably.
So non-cooperation is not always stupid, both because “cooperation” is not a clear idea, and because random defecting by one of the players remains on Pareto frontier.
That’s funny. What you described in the second paragraph is something like a 2-player bimatrix game played across time and space in which the players aren’t even sure of their opponents’ existence and in which our player’s strategy is which decision theory he uses.
Very interesting, and great food for thought. But again, the complication comes from the possible existence of another player. I would argue that it’s reasonable to assume ourselves some ‘breathing room’ of one to two million years before we have to deal with other players. Then in that case, why not build a ‘naive’ FAI which operates under the assumption that there is no other player, let it grow, and then when it has some free time, let it think of a decision theory for you? (I don’t know whether you speak for the SIAI, cousin_it, but I think it would be fair for an outsider to wonder why Yudkowsky thinks this route in particular has the greatest cost/benefit in terms of achieving FAI as fast as possible.)
I’m not affiliated with SIAI in any way. Just like you, I’m an outsider trying to clear up these topics for my own satisfaction :-)
Many people here think that we must get FAI right on the first try, because after it gains power it will resist our attempts to change it. If you code into the AI the assumption that it’s the only player, it won’t believe in other players even when it sees them, and will keep allocating resources to building beautiful gardens even as alien ships are circling overhead (metaphorically speaking). When you ask it to build some guns, it will see you as promoting a suboptimal strategy according to its understanding of what’s likely to work.
It might be preferable to build a less rigid AI that would be open to further amendments from humanity, rather than maximizing its initial utility function no matter what. But we don’t know any mathematical formalism that can express that. The first AIs are likely to be expected utility maximizers just because maximization of expected utility is mathematically neat.
+1 great explanation.
The issue of rigidity is broad and important topic which has been insufficiently addressed on this site. A ‘rigid’ AI cannot be considered rational, because all rational beings are aware that their reasoning processes are prone to error. I would go on further to say that a rigid FAI can be just as dangerous (in the long-term) as a paperclip maximizer. However, the problem of implementing a ‘flexible’ AI would indeed be difficult. Such an AI would be a true inductive agent—even its confidence in the solidity of mathematical proof would be based on empirical evidence. Thus it would be difficult to predict how such an AI might function—there is a risk that the AI would ‘go insane’ as it loses confidence in the validity of the core assumptions underlying its cognitive processes. But this is already taking us far afield of the original subject of discussion.