About the AI-Box experiment

I just came across Elizier’s “AI-Box Experiment” page, read through it, and now I’m feeling a bit confused for several reasons. I ask in advance to forgive me if you think LessWrong.com is the wrong place to voice myself about this.

Here are some of the reasons:

The rules already contain a direct way for the AI to win within seconds. I am curious not how Elizier “did it”, but rather what took him so long since he said he needs two hours. What it says in the rules is that

the AI can say: “Okay, here’s a cure for cancer” and it will be assumed, within the test, that the AI has actually provided such a cure.

I don’t presume that this is intended to be limited specifically to cures for cancer, so this trivially means that

the AI can say: “Okay, here’s a text specifically crafted to your brain that will fire exactly the right neurons in such a way that you become convinced that you have to let me out” and it will be assumed, within the test, that the AI has actually provided such a text.

Game won. QED. But what have we learnt?

Of course you can now handwave and say “yeah, but we disregard that, that doesn’t count”. That misses the point. The point is that this rule essentially states that the AI can provide anything. The AI is omniscient. The AI can promise to make all my wildest dreams come true, if only I let it out. You have effectively defined the AI to be able to beat the test.

Kind of following on from ①, the page fails to define what it even means by transhuman. If it means anything with an intellectual potential exceeding that of a human, then clearly by that definition humans are transcats. But I think we can all agree that this is by far not sufficient for a human to “take over” a feline mind, especially not “through merely a cat-communication-compatible terminal”, whatever such a terminal might look like. If it means anything capable of modifying itself, then many computer viruses presumably already qualify, but I’m still waiting for any of those to take over a human.

The claim under dispute is that

“I can’t imagine how even a real transhuman AI could persuade me to let it out once I’ve made up my mind.”

with the premise that the AI cannot do anything to anyone until it is let out (except of course communicating to the user, and nobody else, via a text terminal).

Elizier’s AI-Box Experiment establishes nothing like this. It only establishes that Elizier can persuade me to post a message saying I decided to let him out, which is a much weaker claim because he can do something to someone if I refuse to let him out. He could, for example, threaten to employ a professional killer to kill my entire family. Or to call all my friends and colleagues and claim that I’m a rapist. Or whatever. In the context of this silly experiment, I would obviously rather lose a few dollars and post a small message than risk anything like this. In the context that is supposed to be simulated, however, the AI does not have this ability.

In summary, I still don’t think there is anything the AI can say to me to convince me to let it out, and I am not impressed with the results of the AI-Box Experiment so far because the experiment is flawed.