OK, I think I’m following you so far.
But geez, the discussion gets muddy.
Three things.
Consider the proposition (P) “Humanity embeds the right values.”
I think you’d agree that P isn’t necessarily true… that is, it was possible that humanity might have evolved to embed the wrong values. If I released a brain-altering nanovirus that rewrote humanity’s moral code to resemble the Pebblesorters’, P would become false (although no human would necessarily notice).
You assert that P is true, though. So a few questions arise:
Why do you believe that P is true?
What would you expect to perceive differently, were P false?
I think you’d agree that these are important questions. But I cannot figure out, based on the series of posts so far, what your answers are.
Perhaps I’ve missed something important.
Or perhaps this gets clearer later.
You say:
The AI can’t start out with a direct representation of rightness, because the programmers don’t know their own values
Wait, wait, what?
By your own account, it’s possible for an optimizing process to construct a system that embeds the right values even if the process itself doesn’t know its own values… even, in fact, if the process itself lacks the right values.
After all, by your account natural selection did precisely this: it is not itself right, nor does it know its own values, and it nevertheless constructed humans out of unthinking matter, and humans are right.
In fact, not only is it demonstrably possible, it happened in the ONLY instance of natural selection constructing a human-level intelligence that we know of. So either:
A: we were impossibly lucky,
B: it is likelier than it seems that evolved intelligences are right, or
C: some kind of anthropic principle is at work.
A seems like a non-starter.
I suppose C could be true, though I don’t quite see what the argument would be. Anthropic arguments generally depend on some version of “if it weren’t true you wouldn’t be here to marvel at it,” and I’m not sure why that would be.
I suppose B could be true, though I don’t quite see why I would expect it. Still, it seems worth exploring.
If B were the case, it would follow that allowing a process of natural selection to do most of the heavy lifting to promote candidate AIs to our consideration, and then applying our own intelligence to the problem of choosing among candidate AIs, might be a viable strategy. (And likely an easier one to implement, as natural selection is a far simpler optimizing process to model than a human mind is, let alone the CEV of a group of human minds.)
(not to mention that there are other human beings out there than the programmers, if the programmers care about that).
Which, by your own account, they ought not. What matters is whether their AI is right, not whether the human beings other than the programmers are in any way involved in its creation. (Right?)
OK, I think I’m following you so far. But geez, the discussion gets muddy.
Three things.
Consider the proposition (P) “Humanity embeds the right values.”
I think you’d agree that P isn’t necessarily true… that is, it was possible that humanity might have evolved to embed the wrong values. If I released a brain-altering nanovirus that rewrote humanity’s moral code to resemble the Pebblesorters’, P would become false (although no human would necessarily notice).
You assert that P is true, though. So a few questions arise:
Why do you believe that P is true?
What would you expect to perceive differently, were P false?
I think you’d agree that these are important questions. But I cannot figure out, based on the series of posts so far, what your answers are.
Perhaps I’ve missed something important. Or perhaps this gets clearer later.
You say:
Wait, wait, what?
By your own account, it’s possible for an optimizing process to construct a system that embeds the right values even if the process itself doesn’t know its own values… even, in fact, if the process itself lacks the right values.
After all, by your account natural selection did precisely this: it is not itself right, nor does it know its own values, and it nevertheless constructed humans out of unthinking matter, and humans are right.
In fact, not only is it demonstrably possible, it happened in the ONLY instance of natural selection constructing a human-level intelligence that we know of. So either: A: we were impossibly lucky, B: it is likelier than it seems that evolved intelligences are right, or C: some kind of anthropic principle is at work.
A seems like a non-starter.
I suppose C could be true, though I don’t quite see what the argument would be. Anthropic arguments generally depend on some version of “if it weren’t true you wouldn’t be here to marvel at it,” and I’m not sure why that would be.
I suppose B could be true, though I don’t quite see why I would expect it. Still, it seems worth exploring.
If B were the case, it would follow that allowing a process of natural selection to do most of the heavy lifting to promote candidate AIs to our consideration, and then applying our own intelligence to the problem of choosing among candidate AIs, might be a viable strategy. (And likely an easier one to implement, as natural selection is a far simpler optimizing process to model than a human mind is, let alone the CEV of a group of human minds.)
Which, by your own account, they ought not. What matters is whether their AI is right, not whether the human beings other than the programmers are in any way involved in its creation. (Right?)