What prevents mankind from designing a corrigible alignment researcher, keeping it deployed internally and ordering it to create an ASI which incorrigibly protects things like liberal democracy or mankind’s CEV?
In that case we’ve passed on the difficulty to the corrigible alignment researcher, while also accepting the constraint “pass on the task to a corrigible alignment researcher whose corrigibility etc you can also trust.”
Or from coming up with a semi-corrigible alignment target which protects us from a lock-in in a different manner?
First of all, I don’t actually understand what the economy of the future will look like even if the AI is optimally aligned. Assuming that the AIs and robots automate away work, I expect the post-ASI economy to be reduced to satisfying resouce-costing demands of humans, which would likely require material resources available to mankind to be distributed in a rather egalitarian manner and/or a manner depending on every human’s capabilities instead of a position locked in ages ago. See also Amodei’s take, out of which I crossed out a sentence because I don’t believe it in the slightest:
Amodei’s take
However, I do think in the long run AI will become so broadly effective and so cheap that this will no longer apply. At that point our current economic setup will no longer make sense, and there will be a need for a broader societal conversation about how the economy should be organized.
While that might sound crazy, the fact is that civilization has successfully navigated major economic shifts in the past: from hunting and gathering to farming, farming to feudalism, and feudalism to industrialism. I suspect that some new and stranger thing will be needed, and that it’s something no one today has done a good job of envisioning. It could be as simple as a large universal basic income for everyone, although I suspect that will only be a small part of a solution. It could be a capitalist economy of AI systems, which then give out resources (huge amounts of them, since the overall economic pie will be gigantic) to humans based on some secondary economy of what the AI systems think makes sense to reward in humans (based on some judgment ultimately derived from human values). Perhaps the economy runs on Whuffie points. Or perhaps humans will continue to be economically valuable after all, in some way not anticipated by the usual economic models. All of these solutions have tons of possible problems, and it’s not possible to know whether they will make sense without lots of iteration and experimentation. And as with some of the other challenges, we will likely have to fight to get a good outcome here: exploitative or dystopian directions are clearly also possible and have to be prevented. Much more could be written about these questions and I hope to do so at some later time.
Secondly, Max Harms’ CAST sequence contains an attempt to formalise power and to have the agent act in such a way that its actions would differentially increase the principal’s utility in such a way that the actions guided by different values wouldn’t. What if an alternate-universe CAST had the agent act in such a way that the host’s actions could make as much difference in the host’s utility function as possible? Then I would suspect that such an agent would help only with tasks close to the host’s capabilities, thus preventing the Intelligence Curse entirely. See also Yudkowsky’s Fun Theory sequence.
In that case we’ve passed on the difficulty to the corrigible alignment researcher, while also accepting the constraint “pass on the task to a corrigible alignment researcher whose corrigibility etc you can also trust.”
Could you sketch this out further?
First of all, I don’t actually understand what the economy of the future will look like even if the AI is optimally aligned. Assuming that the AIs and robots automate away work, I expect the post-ASI economy to be reduced to satisfying resouce-costing demands of humans, which would likely require material resources available to mankind to be distributed in a rather egalitarian manner and/or a manner depending on every human’s capabilities instead of a position locked in ages ago. See also Amodei’s take, out of which I crossed out a sentence because I don’t believe it in the slightest:
Amodei’s take
However, I do think in the long run AI will become so broadly effective and so cheap that this will no longer apply. At that point our current economic setup will no longer make sense, and there will be a need for a broader societal conversation about how the economy should be organized.
While that might sound crazy, the fact is that civilization has successfully navigated major economic shifts in the past: from hunting and gathering to farming, farming to feudalism, and feudalism to industrialism. I suspect that some new and stranger thing will be needed, and that it’s something no one today has done a good job of envisioning. It could be as simple as a large universal basic income for everyone, although I suspect that will only be a small part of a solution. It could be a capitalist economy of AI systems, which then give out resources (huge amounts of them, since the overall economic pie will be gigantic) to humans based on some secondary economy of what the AI systems think makes sense to reward in humans (based on some judgment ultimately derived from human values). Perhaps the economy runs on Whuffie points.
Or perhaps humans will continue to be economically valuable after all, in some way not anticipated by the usual economic models. All of these solutions have tons of possible problems, and it’s not possible to know whether they will make sense without lots of iteration and experimentation. And as with some of the other challenges, we will likely have to fight to get a good outcome here: exploitative or dystopian directions are clearly also possible and have to be prevented. Much more could be written about these questions and I hope to do so at some later time.Secondly, Max Harms’ CAST sequence contains an attempt to formalise power and to have the agent act in such a way that its actions would differentially increase the principal’s utility in such a way that the actions guided by different values wouldn’t. What if an alternate-universe CAST had the agent act in such a way that the host’s actions could make as much difference in the host’s utility function as possible? Then I would suspect that such an agent would help only with tasks close to the host’s capabilities, thus preventing the Intelligence Curse entirely. See also Yudkowsky’s Fun Theory sequence.