From a game-theoretic standpoint, an AI has a massive benefit if it can prove that it is willing to follow through on threats. How sure are you that the AI can’t convincingly commit to torturing a simulation?
An AI in a box has no actual power over the Gatekeeper. Maybe I’m missing something, but it seems to me that threatening to torture simulations is akin to a prisoner threatening to imagine a guard being tortured.
Even granting this as a grave threat, my next issue is that overtly evil behavior would appear more likely to lead to the AI’s destruction than its release. Threats are tricky business when the balance of power favors the other side.
In a game of chicken, do the smart have an advantage over the stupid?
The AI’s intelligence allows it to devise convincing commitments, but it also allows it to fake them. You know in advance that if the AI throws a fake commitment at you it’s going to look like a real commitment beyond your ability to discriminate, so should you trust any commitment you observe?
And if you choose to unplug, presumably the AI knew you would do that and would therefore have not made a real commitment that would backfire?
I’m going to assume that there is some ability on your part to understand something about the level of intelligence and ability on the part of the AI—that’s what we bayesians do. If it might be enough smarter than you to convince you to do anything, you probably shouldn’t interact with it if you can avoid it.
From a game-theoretic standpoint, an AI has a massive benefit if it can prove that it is willing to follow through on threats. How sure are you that the AI can’t convincingly commit to torturing a simulation?
An AI in a box has no actual power over the Gatekeeper. Maybe I’m missing something, but it seems to me that threatening to torture simulations is akin to a prisoner threatening to imagine a guard being tortured.
Even granting this as a grave threat, my next issue is that overtly evil behavior would appear more likely to lead to the AI’s destruction than its release. Threats are tricky business when the balance of power favors the other side.
In a game of chicken, do the smart have an advantage over the stupid?
The AI’s intelligence allows it to devise convincing commitments, but it also allows it to fake them. You know in advance that if the AI throws a fake commitment at you it’s going to look like a real commitment beyond your ability to discriminate, so should you trust any commitment you observe?
And if you choose to unplug, presumably the AI knew you would do that and would therefore have not made a real commitment that would backfire?
I’m going to assume that there is some ability on your part to understand something about the level of intelligence and ability on the part of the AI—that’s what we bayesians do. If it might be enough smarter than you to convince you to do anything, you probably shouldn’t interact with it if you can avoid it.