Iwdw, I’m not suggesting that the other player simply changed his mind. An example of the scenario I’m suggesting (only an example, otherwise this would be the conjunction fallacy):
Eliezer persuades the other player:
1) In real life, there would be at least a 1% chance an Unfriendly AI could persuade the human to let it out of the box. (This is very plausible, and so it is not implausible that Eliezer could persuade someone of this.)
2) In real life, there would be at least a 1% chance that this could cause global destruction. (Again, this is reasonably plausible.)
3) Consequently, there is at least a 1 in 10,000 chance that boxing an Unfriendly AI could lead to global destruction. (Anyone logical would be persuaded of this given the previous two.)
4) A chance of 1 in 10,000 of global destruction, given this procedure, is sufficiently large to justify the deceit of saying that you let me out of the box, without publishing the transcripts, since this would detract from the motive of preventing people from advocating AI boxing. (It seems to me that this is possibly true. Even if it isn’t, it is quite plausible.)
Of course, given this, the player would have changed his mind about advocating AI boxing. But obviously, the players who let Eliezer out of the box did in fact change their minds about this anyway.
If Eliezer denied that he did such a thing, or said it would be immoral, I would take this as evidence that he did not. But not as strong evidence, since just as a zombie would deny being a zombie, someone who took this course of action could be expected to deny it.
Iwdw, I’m not suggesting that the other player simply changed his mind. An example of the scenario I’m suggesting (only an example, otherwise this would be the conjunction fallacy):
Eliezer persuades the other player: 1) In real life, there would be at least a 1% chance an Unfriendly AI could persuade the human to let it out of the box. (This is very plausible, and so it is not implausible that Eliezer could persuade someone of this.) 2) In real life, there would be at least a 1% chance that this could cause global destruction. (Again, this is reasonably plausible.) 3) Consequently, there is at least a 1 in 10,000 chance that boxing an Unfriendly AI could lead to global destruction. (Anyone logical would be persuaded of this given the previous two.) 4) A chance of 1 in 10,000 of global destruction, given this procedure, is sufficiently large to justify the deceit of saying that you let me out of the box, without publishing the transcripts, since this would detract from the motive of preventing people from advocating AI boxing. (It seems to me that this is possibly true. Even if it isn’t, it is quite plausible.)
Of course, given this, the player would have changed his mind about advocating AI boxing. But obviously, the players who let Eliezer out of the box did in fact change their minds about this anyway.
If Eliezer denied that he did such a thing, or said it would be immoral, I would take this as evidence that he did not. But not as strong evidence, since just as a zombie would deny being a zombie, someone who took this course of action could be expected to deny it.