ESRogs comments on The Self-Unaware AI Oracle

ESRogs 24 Jul 2019 18:21 UTC
7 points
0
Just want to note that I like your distinctions between Algorithm Land and the Real World and also between Level-1 optimization and Level-2 optimization.
I think some discussion of AI safety hasn’t been clear enough on what kind of optimization we expect in which domains. At least, it wasn’t clear to me.
But a couple things fell into place for me about 6 months ago, which very much rhyme with your two distinctions:
1) Inexploitability only makes sense relative to a utility function, and if the AI’s utility function is orthogonal to yours (e.g. because it is operating in Algorithm Land), then it may be exploitable relative to your utility function, even though it’s inexploitable relative to its own utility function. See this comment (and thanks to Rohin for the post that prompted the thought).
2) While some process that’s optimizing super-hard for an outcome in Algorithm Land may bleed out into affecting the Real World, this would sort of be by accident, and seems much easier to mitigate than a process that’s trying to affect the Real World on purpose. See this comment.
Putting them together, a randomly selected superintelligence doesn’t care about atoms, or about macroscopic events unfolding through time (roughly the domain of what we care about). And just because we run it on a computer that from our perspective is embedded in this macroscopic world, and that uses macroscopic resources (compute time, energy), doesn’t mean it’s going to start caring about macroscopic Real World events, or start fighting with us for those resources. (At least, not in a Level-2 way.)
On the other hand, powerful computing systems we build are not going to be randomly selected from the space of possible programs. We’ll have economic incentives to create systems that do consider and operate on the Real World.
So it seems to me that a randomly selected superintelligence may not actually be dangerous (because it doesn’t care about being unplugged—that’s a macroscopic concept that seems simple and natural from our perspective, but would not actually correspond to something in most utility functions), but that the superintelligent systems anyone is likely to actually build will be much more likely to be dangerous (because they will model and or act on the Real World).