In contrast to Dagon, I’m quite confident that boxing WILL work given that I’m allowed to do to the boxed entity this list of things which I can do to an ML model:
examine all it’s brain activity / activation states at each token / timestep (subjective millisecond)
run it as slow as I like, so that I can have days to analyze each of its subjective milliseconds
wipe its memory and rerun it over the same (or different) data with different random seeds to initialize it
save and contrast different paused versions run on slightly different data
disable or inhibit any portion of its weights / neurons I choose with precise sub-neuron-targeting and whatever intensity of inhibition or noise-excitation I choose.
use agent models of my choosing to be the direct viewers of the boxed entity’s output rather than myself, and then decide whether to destroy a given copy of the entity based on the output of my agent models
a few other such tricks that fit the description of ‘things I can do to ML models ’
I don’t see a convenient way to do these things to a human test subject though.
Very strong upvote and agreement from me. I think people are underestimating just how great a restriction ML-style boxing applies on an agent. There exists an intelligence level at which all these tricks become useless, but before we get to that point, boxing would likely allow us to safely use mildly superhuman AIs to do things which might be pivotal. And each additional trick we discover increase the threshold of safely wieldable intelligence.
Some of your constraints, in particular the first two, seem like they would not be practical in the real world in which AI would be deployed. On the other hand, there are also other things one could do in the real world which can’t be done in this kind of dialogue, which makes boxing theoretically stronger.
However, the real problem with boxing is that whoever boxes less is likely to have a more effective AI, which likely results in someone letting an AI out of its box or more likely, loosening the box constraints sufficiently to permit an escape.
In contrast to Dagon, I’m quite confident that boxing WILL work given that I’m allowed to do to the boxed entity this list of things which I can do to an ML model:
examine all it’s brain activity / activation states at each token / timestep (subjective millisecond)
run it as slow as I like, so that I can have days to analyze each of its subjective milliseconds
wipe its memory and rerun it over the same (or different) data with different random seeds to initialize it
save and contrast different paused versions run on slightly different data
disable or inhibit any portion of its weights / neurons I choose with precise sub-neuron-targeting and whatever intensity of inhibition or noise-excitation I choose.
use agent models of my choosing to be the direct viewers of the boxed entity’s output rather than myself, and then decide whether to destroy a given copy of the entity based on the output of my agent models
a few other such tricks that fit the description of ‘things I can do to ML models ’
I don’t see a convenient way to do these things to a human test subject though.
Very strong upvote and agreement from me. I think people are underestimating just how great a restriction ML-style boxing applies on an agent. There exists an intelligence level at which all these tricks become useless, but before we get to that point, boxing would likely allow us to safely use mildly superhuman AIs to do things which might be pivotal. And each additional trick we discover increase the threshold of safely wieldable intelligence.
Some of your constraints, in particular the first two, seem like they would not be practical in the real world in which AI would be deployed. On the other hand, there are also other things one could do in the real world which can’t be done in this kind of dialogue, which makes boxing theoretically stronger.
However, the real problem with boxing is that whoever boxes less is likely to have a more effective AI, which likely results in someone letting an AI out of its box or more likely, loosening the box constraints sufficiently to permit an escape.