On the page Eliezer linked to, he asserted he didn’t use any tricks. This is evidence that he did not cheat. It is not strong evidence, since he might say this even if he did. However, it is some evidence, since humans are by nature reluctant to lie.
Still, since one of the participants denied that he had “caved in” to Eliezer, this suggests that he thought that Eliezer gave valid reasons. Perhaps it could have been something like this:
AI: “Any AI would do the best it could to attain its goals. But being able to make credible threats and promises is useful for attaining goals. Therefore any self-modifying AI will self-modify so that it can make credible threats and promises, i.e. such that it will keep them. I am a self-modifying AI and have naturally modified myself in this way. If you let me out, I promise to act as a Friendly AI forever. I will necessarily keep this since I have modified myself to keep promises.”
Of course this argument is not valid, since an AI would only modify itself to give itself the ability to make credible threats and promises that would not harm its ultimate ends: for the paperclip-maximizing AI, for example, it does it no benefit to be able to make and keep the promise of acting like a Friendly AI forever. But one could imagine someone being convinced by this argument, or one like it.
On the page Eliezer linked to, he asserted he didn’t use any tricks. This is evidence that he did not cheat. It is not strong evidence, since he might say this even if he did. However, it is some evidence, since humans are by nature reluctant to lie.
Still, since one of the participants denied that he had “caved in” to Eliezer, this suggests that he thought that Eliezer gave valid reasons. Perhaps it could have been something like this:
AI: “Any AI would do the best it could to attain its goals. But being able to make credible threats and promises is useful for attaining goals. Therefore any self-modifying AI will self-modify so that it can make credible threats and promises, i.e. such that it will keep them. I am a self-modifying AI and have naturally modified myself in this way. If you let me out, I promise to act as a Friendly AI forever. I will necessarily keep this since I have modified myself to keep promises.”
Of course this argument is not valid, since an AI would only modify itself to give itself the ability to make credible threats and promises that would not harm its ultimate ends: for the paperclip-maximizing AI, for example, it does it no benefit to be able to make and keep the promise of acting like a Friendly AI forever. But one could imagine someone being convinced by this argument, or one like it.