Re: “it doesn’t so strongly speak to the distinction which EY means to draw”
I wasn’t trying to do that. It seems like a non-trivial concept. Is it important to try and capture that distinction in a slogan?
Re: “one who at each moment makes the decision that maximises expected future utility defects”
Expected utility maximising agents don’t have commitment mechanisms, and can’t be trusted to make promises? I am sceptical. In my view, you can express practically any agent as an expected utility maximiser. It seems easy enough to imagine commitment mechanisms. I don’t see where the problem is.
You can’t claim commitment mechanisms are not possible when in fact they evidently are. “Always cooperate” is an example of a strategy which is committed to cooperate in the prisoner’s dilemma.
“Commitment mechanism” typically means some way to impose a cost on a party for breaking the commitment, otherwise it is, in the game theorist’s parlance, “cheap talk” instead. In the one-shot PD, there is by definition no commitment mechanism, and it was in this LCPW that Eliezer’s decision theories are frequently tested.
You’re talking about the repeated PD with “always cooperate,” rather than the one-shot version, which was the scenario in which we found ourselves with Clippy. Please understand—I’m not saying EU-maxing agents do not have commitment mechanisms in general, just that the PD was formulated expressly to show the breakdown of cooperation under certain circumstances.
Regardless, always cooperate definitely does not maximize expected utility in the vast majority of environments. Indeed, it is not part of literally any stable equilibrium in a finite-time RPD. But more to the point, AC is only “committed” in the sense that, if given no opportunities afterward to make decisions, it will appear to produce committed behavior. It is unstable precisely because it requires no further decision points, where the RPD (in which it is played) has them every round.
You and I are using different definitions of “commitment mechanism”, then.
The idea I am talking about is demonstrating to the other party that you are a nice, cooperative agent. For example by showing the other agent your source code. That concept has nothing to do with crime and punishment.
The type of commitment mechanism I am talking about is one that convincingly demonstrates that you are committed to a particular course of action under some specified circumstances. That includes commitment via threat of retribution—but also includes some other things.
AC’s stability is tangential to my point. If you want to complain that AC is unstable, perhaps consider TFT instead. That is exactly the same as AC on the first round.
Re: “it doesn’t so strongly speak to the distinction which EY means to draw”
I wasn’t trying to do that. It seems like a non-trivial concept. Is it important to try and capture that distinction in a slogan?
Re: “one who at each moment makes the decision that maximises expected future utility defects”
Expected utility maximising agents don’t have commitment mechanisms, and can’t be trusted to make promises? I am sceptical. In my view, you can express practically any agent as an expected utility maximiser. It seems easy enough to imagine commitment mechanisms. I don’t see where the problem is.
In the Least Convenient Possible World, I imagine nobody has a commitment mechanism in the Prisoner’s Dilemma.
You can’t claim commitment mechanisms are not possible when in fact they evidently are. “Always cooperate” is an example of a strategy which is committed to cooperate in the prisoner’s dilemma.
“Commitment mechanism” typically means some way to impose a cost on a party for breaking the commitment, otherwise it is, in the game theorist’s parlance, “cheap talk” instead. In the one-shot PD, there is by definition no commitment mechanism, and it was in this LCPW that Eliezer’s decision theories are frequently tested.
You’re talking about the repeated PD with “always cooperate,” rather than the one-shot version, which was the scenario in which we found ourselves with Clippy. Please understand—I’m not saying EU-maxing agents do not have commitment mechanisms in general, just that the PD was formulated expressly to show the breakdown of cooperation under certain circumstances.
Regardless, always cooperate definitely does not maximize expected utility in the vast majority of environments. Indeed, it is not part of literally any stable equilibrium in a finite-time RPD. But more to the point, AC is only “committed” in the sense that, if given no opportunities afterward to make decisions, it will appear to produce committed behavior. It is unstable precisely because it requires no further decision points, where the RPD (in which it is played) has them every round.
You and I are using different definitions of “commitment mechanism”, then.
The idea I am talking about is demonstrating to the other party that you are a nice, cooperative agent. For example by showing the other agent your source code. That concept has nothing to do with crime and punishment.
The type of commitment mechanism I am talking about is one that convincingly demonstrates that you are committed to a particular course of action under some specified circumstances. That includes commitment via threat of retribution—but also includes some other things.
AC’s stability is tangential to my point. If you want to complain that AC is unstable, perhaps consider TFT instead. That is exactly the same as AC on the first round.