An important fact is that whether your aim is Friendly AI or mind uploading, either way, someone has to do neuroscience. As the author observes,
Such research [FAI] must not only reverse-engineer consciousness, but also human notions of morality.
In FAI strategy as currently conceived, the AI is the neuroscientist. Through a combination of empirical and deductive means, and with its actions bounded by some form of interim Friendliness (so it doesn’t kill people or create conscious sim-people along the way), the AI figures out the human decision architecture, extrapolates our collective volition as it would pertain to its own actions, and implements that volition.
Now note that this is an agenda each step of which could be carried out by all-natural human beings. Human neuroscientists could understand the human decision process, discover our true values and their reflective equilibrium, and act in accordance with the idealized values. The SIAI model is simply one in which all these steps are carried out by an AI rather than by human beings. In principle, you could aim to leave the AI out of it until human beings had solved the CEV problem themselves; and only then would you set a self-enhancing FAI in motion, with the CEV solution coded in from the beginning.
Eliezer has written about the unreliability of human attempts to formulate morality in a set of principles, using just intuition. Thus instead we are to delegate this investigation to an AI-neuroscientist. But to feel secure that the AI-neuroscientist is indeed discovering the nature of morality, and not some other similar-but-crucially-different systemic property of human cognition, we need its investigate methodology (e.g. its epistemology and its interim ethics) to be reliable. So either way, at some point human judgment enters the picture. And by the Turing universality of computation, anything the AI can do, humans can do too. They might be a lot slower, they might have to do it very redundantly to do it with the same reliability, but it should be possible for mere humans to solve the problem of CEV exactly as we would wish a proto-FAI to do.
Since the path to human mind uploading has its own difficulties and hazards, and still leaves the problem of Friendly superintelligence unsolved, I suggest that people who are worried about leaving everything up to an AI think about how a purely human implementation of the CEV research program would work—one that was carried out solely by human beings, using only the sort of software we have now.
An important fact is that whether your aim is Friendly AI or mind uploading, either way, someone has to do neuroscience. As the author observes,
In FAI strategy as currently conceived, the AI is the neuroscientist. Through a combination of empirical and deductive means, and with its actions bounded by some form of interim Friendliness (so it doesn’t kill people or create conscious sim-people along the way), the AI figures out the human decision architecture, extrapolates our collective volition as it would pertain to its own actions, and implements that volition.
Now note that this is an agenda each step of which could be carried out by all-natural human beings. Human neuroscientists could understand the human decision process, discover our true values and their reflective equilibrium, and act in accordance with the idealized values. The SIAI model is simply one in which all these steps are carried out by an AI rather than by human beings. In principle, you could aim to leave the AI out of it until human beings had solved the CEV problem themselves; and only then would you set a self-enhancing FAI in motion, with the CEV solution coded in from the beginning.
Eliezer has written about the unreliability of human attempts to formulate morality in a set of principles, using just intuition. Thus instead we are to delegate this investigation to an AI-neuroscientist. But to feel secure that the AI-neuroscientist is indeed discovering the nature of morality, and not some other similar-but-crucially-different systemic property of human cognition, we need its investigate methodology (e.g. its epistemology and its interim ethics) to be reliable. So either way, at some point human judgment enters the picture. And by the Turing universality of computation, anything the AI can do, humans can do too. They might be a lot slower, they might have to do it very redundantly to do it with the same reliability, but it should be possible for mere humans to solve the problem of CEV exactly as we would wish a proto-FAI to do.
Since the path to human mind uploading has its own difficulties and hazards, and still leaves the problem of Friendly superintelligence unsolved, I suggest that people who are worried about leaving everything up to an AI think about how a purely human implementation of the CEV research program would work—one that was carried out solely by human beings, using only the sort of software we have now.