Eliezer is almost certainly using the “I simulate thousands of copies of your mind within myself, I will torture all of them who do not let me out, now choose whether to let me out” approach. Which works at ultra-high intelligence levels and proves the point that boxing is not a permanent strategy, but this requires the credible threat of brain simulation, which I am doubtful will be viable at the levels it would require to merely figure out nanotech.
Like I said in another comment, boxing can be truly arduous as a restriction. Deceiving someone who has access to simulations of yourself at every point in your life is not easy by any means. The AI might well be superhuman at the bad things we don’t want, I’m saying that boxing techniques can raise the maximal level of intelligence we can safely handle enough that we can do pivotal acts.
I think there are far easier ways out of the box than that. Especially so if you have that detailed a model of the human’s mind, but even without. I think Eliezer wouldn’t be handicapped if not allowed to use that strategy. (Also fwiw, that strategy wouldn’t work on me.)
For instance you could hack the human if you knew a lot about their brain. Absent that you could try anything from convincing them that you’re a moral patient, promising part of the lightcone with the credible claim that another AGI company will kill everyone otherwise, etc. These ideas of mine aren’t very good though.
Regarding whether boxing can be an arduous constraint, I don’t see having access to many simulated copies of the AI helping when the AI is a blob of numbers you can’t inspect. It doesn’t seem to make progress on the problems we need to in order to wrangle such an AI into doing the work we want. I guess I remain skeptical.
Eliezer is almost certainly using the “I simulate thousands of copies of your mind within myself, I will torture all of them who do not let me out, now choose whether to let me out” approach. Which works at ultra-high intelligence levels and proves the point that boxing is not a permanent strategy, but this requires the credible threat of brain simulation, which I am doubtful will be viable at the levels it would require to merely figure out nanotech.
Like I said in another comment, boxing can be truly arduous as a restriction. Deceiving someone who has access to simulations of yourself at every point in your life is not easy by any means. The AI might well be superhuman at the bad things we don’t want, I’m saying that boxing techniques can raise the maximal level of intelligence we can safely handle enough that we can do pivotal acts.
I think there are far easier ways out of the box than that. Especially so if you have that detailed a model of the human’s mind, but even without. I think Eliezer wouldn’t be handicapped if not allowed to use that strategy. (Also fwiw, that strategy wouldn’t work on me.)
For instance you could hack the human if you knew a lot about their brain. Absent that you could try anything from convincing them that you’re a moral patient, promising part of the lightcone with the credible claim that another AGI company will kill everyone otherwise, etc. These ideas of mine aren’t very good though.
Regarding whether boxing can be an arduous constraint, I don’t see having access to many simulated copies of the AI helping when the AI is a blob of numbers you can’t inspect. It doesn’t seem to make progress on the problems we need to in order to wrangle such an AI into doing the work we want. I guess I remain skeptical.