There is alignment in the sense of an AI doing what is requested, and there is alignment in the sense of, an AI whose values could be an acceptable basis for a transhuman extension of human civilization.
For the latter, there was originally CEV, the coherent extrapolated volition of humanity. There was some hope that humans have a common cognitive kernel from which a metaethics true to human nature could be inferred, and this would provide both the thing to be extrapolated and the way in which to extrapolate it, including an approach to social choice (aggregating the values of multiple agents).
Lately MIRI’s thinking switched to the idea that there isn’t time to resolve all the issues here, before lethally unfriendly AGI is created by accident, so plan B is that the first AGI should be used to stop the world and freeze all AI research everywhere, until the meta meta issues of method and goal are truly sorted out.
However, June Ku has continued with a version of the original CEV concept, which can be seen at Metaethical.AI.
I don’t actually know. I only found out about this a few months ago. Before that, I thought they were still directly trying to solve the problem of “Friendly AI” (as it used to be known, before “alignment” became a buzzword).
The “alignment problem” humanity has as its urgent task is exactly the problem of aligning cognitive work that can be leveraged to prevent the proliferation of tech that destroys the world. Once you solve that, humanity can afford to take as much time as it needs to solve everything else.
There is alignment in the sense of an AI doing what is requested, and there is alignment in the sense of, an AI whose values could be an acceptable basis for a transhuman extension of human civilization.
For the latter, there was originally CEV, the coherent extrapolated volition of humanity. There was some hope that humans have a common cognitive kernel from which a metaethics true to human nature could be inferred, and this would provide both the thing to be extrapolated and the way in which to extrapolate it, including an approach to social choice (aggregating the values of multiple agents).
Lately MIRI’s thinking switched to the idea that there isn’t time to resolve all the issues here, before lethally unfriendly AGI is created by accident, so plan B is that the first AGI should be used to stop the world and freeze all AI research everywhere, until the meta meta issues of method and goal are truly sorted out.
However, June Ku has continued with a version of the original CEV concept, which can be seen at Metaethical.AI.
Hi Mitchell, what would be the best thing to read about MIRI’s latest thinking on this issue (what you call Plan B)?
I don’t actually know. I only found out about this a few months ago. Before that, I thought they were still directly trying to solve the problem of “Friendly AI” (as it used to be known, before “alignment” became a buzzword).
This is the thread where I learned about plan B.
Maybe this comment sums up the new attitude:
Thanks Mitchell, that’s helpful.
I think we need a lot more serious thinking about Plan B strategies.