In fact, a superintelligent AI would easily see that the Pebble people are talking about prime numbers even if they didn’t see that themselves, so as long as they programmed the AI to make “correct” heaps, it certainly would not make heaps of 8, 9, or 1957 pebbles. So if anything, this supports my position: if you program an AI that can actually communicate with human beings, you will naturally program it with a similar morality, without even trying.
Actually, it’s you that can easily see the Pebble people are talking about prime numbers even though they don’t know it. It’s easy for you to see that an AI would figure it out prime numbers; the Pebblesorters have no such confidence.
To put it another way:
The Pebblesorters build an AI, and a critic debates turning it on. The topic is “Will the AI make correct heaps?”
An elder stands up and says, “Yes, it will.” The critic says, “How do you know?”. The elder replies, “It must make correct heaps!”. The critic asks, “Why must it make correct heaps?”. The elder says, “Well, it’s obvious! It’s so easy to see that a heap is correct or incorrect, how could something so smart miss it?”.
Then you stand up and say, “Yes, it will.” The critic says, “How do you know?”. You reply, “Well, prime numbers are a fundamental part of reality; more fundamental still for a mind that is more like a computer than ours. In order for an AI to be powerful, it has to perform some task isomorphic to detecting complex patterns; it seems extremely unlikely that any pattern-finding mechanism that misses prime numbers could possibly support powerful optimisation processes. And so we can be pretty sure that the AI will build heaps of prime numbers only.” The critic responds, “What the hell are prime numbers?”. You say, “Oh! Some unimportant mathematical property, but it turns out that no pile of pebbles that has this property is incorrect, and no pile of pebbles that lacks it is correct, so it acts as a good constraint on the AI.”
Some morals from my story:
Notice how the elder is not justified in his belief and should not turn on the AI, but you are justified, maybe even enough to turn it on. Notice also that when it comes to human morality, we are more like the elder.
“Prime numbers” is an exceedingly simple concept, yet I was only capable of getting to the “we can be pretty sure” level of certainty.
Your explanation is longer and more complicated and has many more ways of failing to be true. Indeed, it’s only the simplicity of the concept that lets me formulate such an explanation and remotely expect it to be correct. (Even then, I’m pretty sure there’s a few nits to pick.)
The response to Unknown sums up the issue already, though.
You may be justified in ascertaining that the AI will figure out what they’re doing. You’re not justified in assuming it will then act on this knowledge instead of identifying and pursuing its own purposes (presuming you’ve codified “purpose” enough for it to not just sit there and modify its own utility function to produce the computer equivalent of shooting up heroin).
Until you know what you’re doing, you can’t get something else to do it for you. The AI programmed without knowledge of what they wanted it to do might cooperate, might not. It would be better to start over, programming it specifically to do what you want it to do.
An old comment from Unknown, in that thread:
Actually, it’s you that can easily see the Pebble people are talking about prime numbers even though they don’t know it. It’s easy for you to see that an AI would figure it out prime numbers; the Pebblesorters have no such confidence.
To put it another way:
The Pebblesorters build an AI, and a critic debates turning it on. The topic is “Will the AI make correct heaps?”
An elder stands up and says, “Yes, it will.” The critic says, “How do you know?”. The elder replies, “It must make correct heaps!”. The critic asks, “Why must it make correct heaps?”. The elder says, “Well, it’s obvious! It’s so easy to see that a heap is correct or incorrect, how could something so smart miss it?”.
Then you stand up and say, “Yes, it will.” The critic says, “How do you know?”. You reply, “Well, prime numbers are a fundamental part of reality; more fundamental still for a mind that is more like a computer than ours. In order for an AI to be powerful, it has to perform some task isomorphic to detecting complex patterns; it seems extremely unlikely that any pattern-finding mechanism that misses prime numbers could possibly support powerful optimisation processes. And so we can be pretty sure that the AI will build heaps of prime numbers only.” The critic responds, “What the hell are prime numbers?”. You say, “Oh! Some unimportant mathematical property, but it turns out that no pile of pebbles that has this property is incorrect, and no pile of pebbles that lacks it is correct, so it acts as a good constraint on the AI.”
Some morals from my story:
Notice how the elder is not justified in his belief and should not turn on the AI, but you are justified, maybe even enough to turn it on. Notice also that when it comes to human morality, we are more like the elder.
“Prime numbers” is an exceedingly simple concept, yet I was only capable of getting to the “we can be pretty sure” level of certainty.
Your explanation is longer and more complicated and has many more ways of failing to be true. Indeed, it’s only the simplicity of the concept that lets me formulate such an explanation and remotely expect it to be correct. (Even then, I’m pretty sure there’s a few nits to pick.)
The response to Unknown sums up the issue already, though.
You may be justified in ascertaining that the AI will figure out what they’re doing. You’re not justified in assuming it will then act on this knowledge instead of identifying and pursuing its own purposes (presuming you’ve codified “purpose” enough for it to not just sit there and modify its own utility function to produce the computer equivalent of shooting up heroin).
Until you know what you’re doing, you can’t get something else to do it for you. The AI programmed without knowledge of what they wanted it to do might cooperate, might not. It would be better to start over, programming it specifically to do what you want it to do.