If I were designing an intelligence, I’m not sure how much control I would give it over its own brain.
This sounds like it has the same failure modes as boxing. E.g. an AI doesn’t need direct Write access to its source code if it can manipulate its caretakers into altering it. Like boxing, it slows things down and raises the threshold of intelligence required for world domination, but doesn’t actually solve the problem.
You’d better prevent it from building a successor that’s just like itself but with certain modifications. To prevent that you need at least to prevent it from having absolute introspective access...
well, an intelligence that cannot self-modify (for a broad enough understanding of “self-modify”) is significantly less likely to get superintelligent (though of course it’s possible that I end up designing a superintelligence first time out of the park).
that said, “can’t modify its own brain” != “can’t self-modify” in that broad sense… if I can’t modify my own brain but I can create a copy of myself whose brain I can modify, most of the same difficulties arise. (Unless I happen to believe, like many humans do, that a copy of myself at time T is importantly different from me at time T, in which case maybe those difficulties don’t arise.)
This sounds like it has the same failure modes as boxing. E.g. an AI doesn’t need direct Write access to its source code if it can manipulate its caretakers into altering it. Like boxing, it slows things down and raises the threshold of intelligence required for world domination, but doesn’t actually solve the problem.
You’d better prevent it from building a successor that’s just like itself but with certain modifications. To prevent that you need at least to prevent it from having absolute introspective access...
well, an intelligence that cannot self-modify (for a broad enough understanding of “self-modify”) is significantly less likely to get superintelligent (though of course it’s possible that I end up designing a superintelligence first time out of the park).
that said, “can’t modify its own brain” != “can’t self-modify” in that broad sense… if I can’t modify my own brain but I can create a copy of myself whose brain I can modify, most of the same difficulties arise. (Unless I happen to believe, like many humans do, that a copy of myself at time T is importantly different from me at time T, in which case maybe those difficulties don’t arise.)